Intro Protein structure Motifs Motif databases End. Last time. Probability based methods How find a good root? Reliability Reconciliation analysis
|
|
- Shon Bruce
- 5 years ago
- Views:
Transcription
1 Last time Probability based methods How find a good root? Reliability Reconciliation analysis
2 Today Intro to proteinstructure Motifs and domains
3 First dogma of Bioinformatics Sequence structure function
4 First dogma of Bioinformatics Sequence structure function Want to avoid determining structure Expensive Difficult Sometimes impossible?
5 First dogma of Bioinformatics Sequence structure function Want to avoid determining structure Expensive Difficult Sometimes impossible? Bioinfo dream: Structure from sequence! How does the protein fold?
6 Ab initio folding? Folding from sequence seems out of reach
7 Ab initio folding? Folding from sequence seems out of reach But...:
8 What to do in silico? 1. Compromise and use what you ve got. Recycle structures
9 What to do in silico? 1. Compromise and use what you ve got. Recycle structures 2. Find and understand protein building blocks: motifs and domains.
10 What to do in silico? 1. Compromise and use what you ve got. Recycle structures 2. Find and understand protein building blocks: motifs and domains. 3. Identify certain protein types: transmembrane proteins
11 What to do in silico? 1. Compromise and use what you ve got. Recycle structures 2. Find and understand protein building blocks: motifs and domains. 3. Identify certain protein types: transmembrane proteins 4. Why bother? Sequences are informative!
12 Example 1: Motifs and domains (Bjarnadottir et al, 2004) Some typical G-protein coupled receptors Small circles: glycolization sites Other symbols: domains
13 Example 2: Domains and structure
14 Our goals Motifs: Representation and use
15 Our goals Motifs: Representation and use Domains: Definitions, hidden Markov models (HMM), applications, databases
16 Our goals Motifs: Representation and use Domains: Definitions, hidden Markov models (HMM), applications, databases PSI-Blast: Sensitive search tool
17 Our goals Motifs: Representation and use Domains: Definitions, hidden Markov models (HMM), applications, databases PSI-Blast: Sensitive search tool Secondary structure: In general and the TM special case
18 Motifs Short subsequences, DNA or AA, 5 20 positions long.
19 Motifs Short subsequences, DNA or AA, 5 20 positions long. Foremost application: binding sites
20 Motifs Short subsequences, DNA or AA, 5 20 positions long. Foremost application: binding sites Motifs grouped in families. Confused terminology.
21 Motifs Short subsequences, DNA or AA, 5 20 positions long. Foremost application: binding sites Motifs grouped in families. Confused terminology. Fingerprints: Combinations of motifs
22 Motif representation MTWDNRLAAFAQNYANQRA MTWDNRLAAYAQNYANQRI MTWDNRLAAYAQNYANQRI MTWDDGLAAYAQNYANQRA VSWSTKLQAYAQSYANQRI LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA VSWSTKLQGFAQSYANQRI MSWDANLASRAQNYANSRA VSWSTKLQAFAQNYANQRI LRWDEKVAAYARNYANQRK LRWDEKVAAYARNYANQRK VSWSTKLQAFAQNYANQRI LVWNDELAQIAQVWANQCN LVWNDELAQIAQVWANQCN LTWDDEVAAYAQNYVSQLA LTWDDQVAAYAQNYASQLA VSWSTKLQAFAQNYANQRI LVWSDELAYIAQVWANQCQ LVWNDELAYVAQVWANQCQ... Motif V5TPXLIKE 95 seqs, width 19. Multialignment (shortened)
23 Motif representation MTWDNRLAAFAQNYANQRA MTWDNRLAAYAQNYANQRI MTWDNRLAAYAQNYANQRI MTWDDGLAAYAQNYANQRA VSWSTKLQAYAQSYANQRI LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA VSWSTKLQGFAQSYANQRI MSWDANLASRAQNYANSRA VSWSTKLQAFAQNYANQRI LRWDEKVAAYARNYANQRK LRWDEKVAAYARNYANQRK VSWSTKLQAFAQNYANQRI LVWNDELAQIAQVWANQCN LVWNDELAQIAQVWANQCN LTWDDEVAAYAQNYVSQLA LTWDDQVAAYAQNYASQLA VSWSTKLQAFAQNYANQRI LVWSDELAYIAQVWANQCQ LVWNDELAYVAQVWANQCQ... Motif V5TPXLIKE 95 seqs, width 19. Multialignment Pattern notation, eg: [LMV]-[RSTV]-W-[DSN]-... (shortened)
24 Motif representation MTWDNRLAAFAQNYANQRA MTWDNRLAAYAQNYANQRI MTWDNRLAAYAQNYANQRI MTWDDGLAAYAQNYANQRA VSWSTKLQAYAQSYANQRI LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA VSWSTKLQGFAQSYANQRI MSWDANLASRAQNYANSRA VSWSTKLQAFAQNYANQRI LRWDEKVAAYARNYANQRK LRWDEKVAAYARNYANQRK VSWSTKLQAFAQNYANQRI LVWNDELAQIAQVWANQCN LVWNDELAQIAQVWANQCN LTWDDEVAAYAQNYVSQLA LTWDDQVAAYAQNYASQLA VSWSTKLQAFAQNYANQRI LVWSDELAYIAQVWANQCQ LVWNDELAYVAQVWANQCQ... Motif V5TPXLIKE 95 seqs, width 19. Multialignment Pattern notation, eg: [LMV]-[RSTV]-W-[DSN]-... Profiles and PSSM, PWM (shortened)
25 Motif representation MTWDNRLAAFAQNYANQRA MTWDNRLAAYAQNYANQRI MTWDNRLAAYAQNYANQRI MTWDDGLAAYAQNYANQRA VSWSTKLQAYAQSYANQRI LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA LTWDDQVAAYAQNYASQLA VSWSTKLQGFAQSYANQRI MSWDANLASRAQNYANSRA VSWSTKLQAFAQNYANQRI LRWDEKVAAYARNYANQRK LRWDEKVAAYARNYANQRK VSWSTKLQAFAQNYANQRI LVWNDELAQIAQVWANQCN LVWNDELAQIAQVWANQCN LTWDDEVAAYAQNYVSQLA LTWDDQVAAYAQNYASQLA VSWSTKLQAFAQNYANQRI LVWSDELAYIAQVWANQCQ LVWNDELAYVAQVWANQCQ... (shortened) Motif V5TPXLIKE 95 seqs, width 19. Multialignment Pattern notation, eg: [LMV]-[RSTV]-W-[DSN]-... Profiles and PSSM, PWM Visualize with sequences logo
26 M GIVKFRSAQHYE 4 YWLYHSN D IR V 9 SYT Q EAGDKYSE 12 ST MAAVTK VM 15 RNFHYWV SLNR Intro Protein structure Motifs Motif databases End Sequence logo bits 4 3 LM 2 1 T 0 IV SKV E C IRQ AYN P TNDC D KT E N M ASGQ M V L A TI QTA RSN Y H N LAVRT KI M F E I R HWM Q D KR N Q A I S W TEV LRC DN KQ P R AQKS T VAI N EH H D SYG E K Y TQ SA G THY R PSSM of PR00837A (V5TPXLIKE;) 95 sequences. Height indicate conservation (Too many details: Height is the Kullback-Leibler distance to the uniform distribution) Symbol height proportional to frequency
27 Start of translation
28 Phosporelation site, PKA (Blom et al, 1998)
29 Profiles Multialignments convenient Patterns sparse with information Logos are pretty pictures!
30 C G TGAA Intro Protein structure Motifs Motif databases End Profiles Multialignments convenient Patterns sparse with information Logos are pretty pictures! Profile: Matrix F with frequency information F r,c is fraction r in position c Pos: A C GC C G TCA G T T 5 T WebLogo 3.0b14 bits 2.0
31 Profiles Pos: A C G T F r,c = n r,c /n, where n r,c number of r in position c, and n is sequence count.
32 Profiles Pos: A C G T F r,c = n r,c /n, where n r,c number of r in position c, and n is sequence count. For A in position 1: n A,1 = 12 and n = 20
33 Profiles Pos: A C G T Probability of AACATT being produced by profile: = Is that good? Interpretation?
34 PSSM: Better than profile Want a log-odds score!
35 PSSM: Better than profile Want a log-odds score! PSSM=Position Specific Scoring Matrix
36 PSSM: Better than profile Want a log-odds score! PSSM=Position Specific Scoring Matrix M r,c = 10 log 2 ( Fr,c /π r ), where πr is frequency of r in our data.
37 PSSM: Better than profile Want a log-odds score! PSSM=Position Specific Scoring Matrix M r,c = 10 log 2 ( Fr,c /π r ), where πr is frequency of r in our data. Let π A = π C = π G = π T = 0.25.
38 PSSM: Better than profile Want a log-odds score! PSSM=Position Specific Scoring Matrix M r,c = 10 log 2 ( Fr,c /π r ), where πr is frequency of r in our data. Let π A = π C = π G = π T = M A,1 = 10 log 2 (F A,1 /0.25) = 10 log 2 (0.6/0.25) = 12.6
39 PSSM: Better than profile Want a log-odds score! PSSM=Position Specific Scoring Matrix M r,c = 10 log 2 ( Fr,c /π r ), where πr is frequency of r in our data. Let π A = π C = π G = π T = M A,1 = 10 log 2 (F A,1 /0.25) = 10 log 2 (0.6/0.25) = 12.6 M C,2 = 10 log 2 (F C,2 /0.25) = 10 log 2 (0.25/0.25) = 0
40 Profile: PSSM: PSSM M from our profile F Pos: A C G T Pos: A C G T
41 Profile: PSSM: PSSM M from our profile F Pos: A C G T Pos: A C G T Score for AACATT: = 40.5
42 Generalizing with PSSM? How handle a new variant of a motif? Pos: A C G T
43 Generalizing with PSSM? How handle a new variant of a motif? Pos: A C G T Score for ATCTTT? = 56
44 Pseudo counts for profiles Idea: Pretend you have seen all possible motifs
45 Pseudo counts for profiles Idea: Pretend you have seen all possible motifs Pseudo counts: α r is number of pseudo observations of r.
46 Pseudo counts for profiles Idea: Pretend you have seen all possible motifs Pseudo counts: α r is number of pseudo observations of r. Include in profile calculations: F r,c = n r,c + α r n + r α r
47 Pseudo counts for profiles Idea: Pretend you have seen all possible motifs Pseudo counts: α r is number of pseudo observations of r. Include in profile calculations: F r,c = n r,c + α r n + r α r Example 1: Let α A = α C = α G = α T = 1. F A,1 = = 0.54.
48 Pseudo counts for profiles Idea: Pretend you have seen all possible motifs Pseudo counts: α r is number of pseudo observations of r. Include in profile calculations: F r,c = n r,c + α r n + r α r Example 1: Let α A = α C = α G = α T = 1. F A,1 = = Example 2: We had n C,1 = 0. F C,1 = = 0.042
49 Pseudo counts for profiles Idea: Pretend you have seen all possible motifs Pseudo counts: α r is number of pseudo observations of r. Include in profile calculations: F r,c = n r,c + α r n + r α r Example 1: Let α A = α C = α G = α T = 1. F A,1 = = Example 2: We had n C,1 = 0. F C,1 = = Result: Can use PSSM to find novel motifs
50 Fast motif searches Motifs are small, therefore easy to search with. Fast.
51 Fast motif searches Motifs are small, therefore easy to search with. Fast. Blast variants exists for motifs.
52 Fast motif searches Motifs are small, therefore easy to search with. Fast. Blast variants exists for motifs. E-value theory same thanks to log-odds score!
53 Motif databases PROSITE: Important binding sites What motifs does my protein have? Profiles Pattern notation Careful documentation
54 Motif databases PROSITE: Important binding sites What motifs does my protein have? Profiles Pattern notation Careful documentation BLOCKS: Origin to BLOSUM. Presents multialignments! Assembled by most conserved parts of domains.
55 Motif databases PROSITE: Important binding sites What motifs does my protein have? Profiles Pattern notation Careful documentation BLOCKS: Origin to BLOSUM. Presents multialignments! Assembled by most conserved parts of domains. PRINTS: What motif combinations does my protein have?
56 Next time PSI-Blast Protein domains Domain databases Hidden Markov models?
Week 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationHidden Markov Models (HMMs) and Profiles
Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationQuantitative Bioinformatics
Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationGenome Annotation. Qi Sun Bioinformatics Facility Cornell University
Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationSequence Analysis and Databases 2: Sequences and Multiple Alignments
1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre (jmgonzalez@cnio.es) 2 Sequence Comparisons:
More informationChristian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel
Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a
More informationPosition-specific scoring matrices (PSSM)
Regulatory Sequence nalysis Position-specific scoring matrices (PSSM) Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d ix-marseille, France Technological dvances for Genomics and Clinics
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationHidden Markov Models in computational biology. Ron Elber Computer Science Cornell
Hidden Markov Models in computational biology Ron Elber Computer Science Cornell 1 Or: how to fish homolog sequences from a database Many sequences in database RPOBESEQ Partitioned data base 2 An accessible
More informationGenome Annotation Project Presentation
Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261
More information-max_target_seqs: maximum number of targets to report
Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:
More informationGrundlagen der Bioinformatik, SS 08, D. Huson, May 2,
Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, 2008 39 5 Blast This lecture is based on the following, which are all recommended reading: R. Merkl, S. Waack: Bioinformatik Interaktiv. Chapter 11.4-11.7
More informationSequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5
Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationSequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.
Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationMEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY
Command line Training Set First Motif Summary of Motifs Termination Explanation MEME - Motif discovery tool MEME version 3.0 (Release date: 2002/04/02 00:11:59) For further information on how to interpret
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationExercise 5. Sequence Profiles & BLAST
Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)
More information1-D Predictions. Prediction of local features: Secondary structure & surface exposure
1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local
More informationGS 559. Lecture 12a, 2/12/09 Larry Ruzzo. A little more about motifs
GS 559 Lecture 12a, 2/12/09 Larry Ruzzo A little more about motifs 1 Reflections from 2/10 Bioinformatics: Motif scanning stuff was very cool Good explanation of max likelihood; good use of examples (2)
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More information15-381: Artificial Intelligence. Hidden Markov Models (HMMs)
15-381: Artificial Intelligence Hidden Markov Models (HMMs) What s wrong with Bayesian networks Bayesian networks are very useful for modeling joint distributions But they have their limitations: - Cannot
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationPlan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping
Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics
More informationBiology 644: Bioinformatics
A stochastic (probabilistic) model that assumes the Markov property Markov property is satisfied when the conditional probability distribution of future states of the process (conditional on both past
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationMotifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC
Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationProtein Structure Prediction Using Neural Networks
Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally
More informationSimilarity searching summary (2)
Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity
More informationHidden Markov Models. Terminology, Representation and Basic Problems
Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationCOMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University
COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationJessica Wehner. Summer Fellow Bioengineering and Bioinformatics Summer Institute University of Pittsburgh 29 May 2008
Journal Club Jessica Wehner Summer Fellow Bioengineering and Bioinformatics Summer Institute University of Pittsburgh 29 May 2008 Comparison of Probabilistic Combination Methods for Protein Secondary Structure
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein
More informationQB LECTURE #4: Motif Finding
QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin
More informationHidden Markov Models and Their Applications in Biological Sequence Analysis
Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract
More information- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.
NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationA multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling Ari Ugarte, Riccardo Vicedomini, Juliana Silva Bernardes, Alessandra Carbone 9 September,
More informationToday. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure
Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK
More informationIntro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models
Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL
More informationTutorial 4 Substitution matrices and PSI-BLAST
Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about
More information11.3 Decoding Algorithm
11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence
More informationChapter 7: Rapid alignment methods: FASTA and BLAST
Chapter 7: Rapid alignment methods: FASTA and BLAST The biological problem Search strategies FASTA BLAST Introduction to bioinformatics, Autumn 2007 117 BLAST: Basic Local Alignment Search Tool BLAST (Altschul
More informationMultiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:
Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:
More informationMotifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.
Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationModeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions)
Modeling Motifs Collecting Data (Measuring and Modeling Specificity of Protein-DNA Interactions) Computational Genomics Course Cold Spring Harbor Labs Oct 31, 2016 Gary D. Stormo Department of Genetics
More informationMultiple sequence alignment
Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple
More informationLab 3: Practical Hidden Markov Models (HMM)
Advanced Topics in Bioinformatics Lab 3: Practical Hidden Markov Models () Maoying, Wu Department of Bioinformatics & Biostatistics Shanghai Jiao Tong University November 27, 2014 Hidden Markov Models
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationMarkov Chains and Hidden Markov Models. = stochastic, generative models
Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More informationFirst generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences
First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences 140.638 where do sequences come from? DNA is not hard to extract (getting DNA from a
More information09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition
Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationMixture Mode for Peptide Mass Fingerprinting ASMS 2003
Mixture Mode for Peptide Mass Fingerprinting ASMS 2003 1 Mixture Mode: New in Mascot 1.9 All peptide mass fingerprint searches now test for the possibility that the sample is a mixture of proteins. Mascot
More informationMultiple Sequence Alignment
Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for
More informationHidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes
Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationGibbs sampling. Massimo Andreatta Center for Biological Sequence Analysis Technical University of Denmark.
Gibbs sampling Massimo Andreatta Center for Biological Sequence Analysis Technical University of Denmark massimo@cbs.dtu.dk Technical University of Denmark 1 Monte Carlo simulations MC methods use repeated
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationComputational Molecular Biology (
Computational Molecular Biology (http://cmgm cmgm.stanford.edu/biochem218/) Biochemistry 218/Medical Information Sciences 231 Douglas L. Brutlag, Lee Kozar Jimmy Huang, Josh Silverman Lecture Syllabus
More informationSequence Analysis '17 -- lecture 7
Sequence Analysis '17 -- lecture 7 Significance E-values How significant is that? Please give me a number for......how likely the data would not have been the result of chance,......as opposed to......a
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationBiological Systems: Open Access
Biological Systems: Open Access Biological Systems: Open Access Liu and Zheng, 2016, 5:1 http://dx.doi.org/10.4172/2329-6577.1000153 ISSN: 2329-6577 Research Article ariant Maps to Identify Coding and
More informationHidden Markov Models Hamid R. Rabiee
Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However
More information20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming
20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment
More informationIntroductory course on Multiple Sequence Alignment Part I: Theoretical foundations
Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationLecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models
Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary
More informationBioinformatics: Secondary Structure Prediction
Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries
More informationGenome 559 Wi RNA Function, Search, Discovery
Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,
More informationAlignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in
More information