Potts Models and Protein Covariation. Allan Haldane Ron Levy Group
|
|
- Ashley Stewart
- 5 years ago
- Views:
Transcription
1 Temple University Structural Bioinformatics II Potts Models and Protein Covariation Allan Haldane Ron Levy Group
2 Outline The Evolutionary Origins of Protein Sequence Variation Contents of the Pfam database Protein Evolution (tour of concepts & current ideas) Protein Fitness, Marginal Stability, Compensatory mutations Using Protein-sequence-variation profitably Correlated Mutations and Structural Contacts Potts Models: Theory Review of Potts Model Results Protein Families and their Evolution A Structural perspective. Annual Review of Biochemistry 2005 Bridging the physical scales in ev. biology: from protein sequence space to fitness of organisms and populations COSB 2017 Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness COSB 2017
3 Pfam Pfam is a collection of HMMs and MSAs of many protein families Uses HMMs to detect new sequences 16,306 curated protein families (folds) 23 million protein sequences 7.6 billion residues collected from all branches of life
4 Pfam website example Fibronectin type III Seed Matches All fibronectin sequences in the alignment have the same beta sandwich fold. ~65,000 different sequences ~100 residues long How similar do you think the sequences are (% identical aa)?
5 Fibronectin Type III domain MSA Average sequence identity: 19% WebLogo Seq. ID MSA
6 Fibronectin has quite typical diversity
7 The Protein Universe Protein sequence structure mapping is highly degenerate Typically ~30-40% identity per fold, as low as 8-9% (equal to random sequences) Space of all possible sequences is Space of all sequences sharing a common folded structure may be (rough intuition/guess) (Poses a challenge for HMM methods: Below 20% identity, HMMs have difficulty predicting whether sequence share a fold. This is known as the twilight zone of sequence similarity) How much of protein sequence space has been explored by life on Earth? Dryden et al. J Royal Society Interface (2008) Twilight zone of protein sequence alignments. Rost, Protein Eng Des Sel (1999)
8 The Protein Universe Many authors suggest there are only 1000 to 10,000 existing protein folds across all life 16,384 families in Pfam 1086 folds, 3464 families in SCOP (compare to 20,000 protein-genes in human genome) Protein Families and their Evolution A Structural perspective. Annual Review of Biochemistry 2005 Estimating the total number of protein folds. Govindarajan et al. Proteins: Struct. Func. Bioinfo Expanding protein universe and its origin from the Biological Big Bang PNAS 2002 The Protein Folds as Platonic Forms. Denton et al. Journal of Theoretical Biology 2002
9 Evolutionary origins of proteins 4 genomes with multiple variants of the same protein fold: (usually same function) (often different function) New variants/copies are generated by Tree of 512 kinases in the human genome (paralogs) Tree of c-src kinase across species (orthologs) Gene duplication (paralog) Speciation (ortholog)
10 How is sequence diversity generated? Observations More closely related species have fewer differences per (orthologous) protein % difference in Hemoglobin between related species The Neutral Theory of Molecular Evolution. Kimura 1983
11 How is sequence diversity generated? # of substitutions per site Observations (fossil record) substitutions occur at a constant rate Rate constant hypothesis or Molecular Clock (Zuckerland & Pauling 1965) Technical detail (correction for repeated substitutions) p = percent final sequence difference K = number of past substitution events
12 How is sequence diversity generated? Observations Proteins appear to accumulate substitutions at a constant rate, usually about 1 substitution per site per billion years
13 How is sequence diversity generated? Observations Proteins appear to accumulate substitutions at a constant rate, usually about 1 substitution per site per billion years Different proteins have different evolutionary rates
14 How is sequence diversity generated? Observations Proteins appear to accumulate substitutions at a constant rate, usually about 1 substitution per site per billion years Different proteins have different evolutionary rates Different parts of a single protein have different evolutionary rates Why is there variation in evolutionary rate? Causes of evolutionary rate variation among protein sites. Nature Reviews Genetics (2016)
15 Summary Many possible sequences lead to same fold Proteins in a common family/fold accumulate substitutions at a constant rate over time
16 How is sequence diversity generated? Why is there so much variation? How do substitutions happen? Selective pressure (Natural Selection) on protein sequences: function stability (folding) (non-)aggregation... Evolutionary forces acting on proteins: Natural Selection (selective pressures above) Mutation Genetic Drift Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations COSB 2017
17 How is sequence diversity generated? Why is there so much variation? How do substitutions happen? Selective pressure (Natural Selection) on protein sequences: function stability (folding) Many mutations affect stability (non-)aggregation (Functional site is often small % of protein)... Evolutionary forces acting on proteins: Natural Selection (selective pressures above) Mutation Genetic Drift Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations COSB 2017
18 Protein Folding Biochemistry (understanding selective forces in protein evolution) Two-State model of Folding Reality is more complicated: disordered state molten globule state native state (folded) decoy state hyper-stability Folded Unfolded
19 Variations in Stability Distribution of stabilities Different sequences will have (slightly) different stabilities Protein are marginally stable Possible explanations: Hyper-stability is penalized? More unstable sequences/mutations? Stability related to overall fitness Stability varies as proteins evolve Missense meanderings in sequence space: a biophysical view of protein evolution DePristo, Weinreich, Hartl. Nat Rev Genet 2005
20 Compensatory Mutations/Substitutions Eg, a destabilizing substitution is compensated for by a stabilizing substitution Epistasis When the effect of a mutation (eg, on stability) depends on the identity of other residues. destabilizing stabilizing CTL escape and viral fitness in HIV/SIV infection Front. Microbiol 2010
21 Protein Evolution Why do deleterious (stability-reducing) mutations occur? Why aren t proteins optimally stable?? Evolutionary forces acting on proteins: Natural Selection (previous few slides) Mutation Genetic Drift (next 2 slides) Recombination (not discussed) Quick intro to the Wright-Fisher Model & Population Genetics
22 The Wright-Fisher Model (without natural selection) Need to understand how new variants arise at the population level Genetic Drift = fluctuations in allele frequencies. It causes new alleles to fix in the population even without any natural selection. Population of 10 individuals with different (equally fit) genotypes. (asexual) Next generation formed by random sample (with replacement) of previous generation Simulation 1 Cyan genotype has fixed Simulation 2 Time
23 The Wright-Fisher Model Two allele case, with selection Scenario: All individuals in population have the same protein, but one individual mutates Natural Selection modelled by assigning a weight (fitness) to each genotype, and performing a weighted sample to get the next generation. Then mutant s fixation probability = s = selection coefficient N = population size Neutral Say we assign Old genotype has a weight of 1 Mutant individual has a weight of 1+s Deleterious Beneficial (Kimura s fixation probability) Conclusion: Genetic Drift can cause a new mutant to fix even if it s deleterious (s < 0).
24 Mutation-Selection Balance destabilizing stabilizing Mutation: An individual mutates to a new variant Substitution: A mutant genotype appears and fixes in the population Most protein mutations slightly decrease stability (deleterious). A small number of mutations increase stability. Most deleterious mutations do not fix. Mutation-Selection Balance # deleterious substitutions = # beneficial substitutions (population genetics theory can be used to quantitatively understand when/how this balance occurs) (alternative explanation for why proteins are marginally stable) Why are proteins marginally stable? Taverna, Goldstein. Proteins: Structure, Function, and Bioinformatics 2002 Missense meanderings in sequence space: a biophysical view of protein evolution DePristo, Weinreich, Hartl. Nat Rev Genet 2005 Stability effects of mutations and protein evolvability Tokuriki, Tawfik. Current Opinion in Structural Biology 2009 How Protein Stability and New Functions Trade Off Tokuriki, Stricher, Serrano, Tawfik. PLoS Comput Biol 2008
25 Summary Many possible sequences lead to same fold Proteins in a common family/fold accumulate substitutions at a constant rate over time Most substitutions affect protein stability There is a dynamic balance of slightly deleterious (destabilizing) and slightly beneficial (stabilizing) substitutions over time. Marginal stability is maintained. This dynamic balance also involves: Compensatory mutations Epistatic interactions
26 Part II: Potts Models (Using Protein-sequence-variation to study structure) Outline Motivation and Background Parameterizing a Potts Model Applications of Potts models contacts in protein structure Compensatory mutations correlations in MSA columns?
27 Coevolutionary Analysis and Potts Models Correlated Mutations in a MSA imply Structural Interactions Long history (25 years) of Coevolutionary analysis: Detect Correlated positions, then predict contacts Recent Developments: Instead of modeling each residue pair individually, build a correlated statistical model of the MSA: The Potts model The model can be used for more than contact prediction Lövkvist et al, PRE 87, 2013
28 How to measure correlations in an MSA Positions Residue types Bivariate marginal (frequency) Univariate marginal (frequency) (Example: ) Correlations: Observed pairwise frequency Expected pairwise frequency if positions vary independently if the two positions vary independently
29 Pairwise measures of correlations in an MSA Want a correlation score between position-pairs (sum over Different scoring methods in literature (two shown below) All designed to give a score of 0 for independent variation Mutual Information (MI) Can be interpreted as the neg. log likelihood of generating the distribution when sampling from the distribution Statistical Coupling (SCA) Probability of at i excluding sequences with mutation at j Bar means average over all positions )
30 Relationship to contacts Top-ranking MI (and other) scores finds top contact with about 70% true-positive rate, top 50 at 50%. Can this be improved? Protein 3D Structure Computed from Evolutionary Sequence Variation. Marks et al Plos One 2011 Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction Dunn, Wahl, Gloor. Bioinformatics 2008
31 Direct vs Indirect Correlations Problem: MI, SCA, Cij can be though of as local models of the correlation: They look at single pairs at a time However, correlations can be caused by indirect interactions, or correlated networks these local models ignore this Eg: Position 7 interacts with 15, and 15 interacts with 25. Then 7 and 25 will be correlated even though they don t interact. Instead want to make a global model of the correlations, in order to distinguish direct from indirect correlations. Idea: Make a statistical model of the sequence as a whole vs Probability of a pair of residues Probability of a whole sequence
32 Potts Models: Origin & Motivation How to model P(S)? P(S) describes the probability of generating sequce S, where S spans the entire sequence space. Most general form of P(S) is the set of probabilities of all sequences a model with parameters! We can t directly measure P(S) from an MSA (since each sequence appears once). Unlike bivariate marginals, which can be directly measured from an MSA. Solution: Get the least biased distribution subject to constraints: The Maximum Entropy distribution P(S)
33 Maximum Entropy The Maximum Entropy distribution P(S) Maximizing entropy minimizes the amount of prior information built into the distribution Number of model parameters will be equal to number of constraints Entropy of a distribution: In our case, set constraint that P(S) gives the right bivariate marginals (pairwise correlation statistics) Bivariate marginals from P(S): Sum over entire sequence space
34 Maximum Entropy Entropy of a distribution Constraints: Method of Lagrange Multipliers: Lagrange multipliers, one per constraint Maximize by solving for all S
35 Maximum Entropy Solved by: Rearrange to give: (Boltzmann distribution) ( statistical energy ) (normalization) This gives the Potts Model Note: This model is named for its history in physics of magnetic materials, has many other applications
36 Form of the Potts Model Fields (L x q) Couplings ( (L x q)2) L = sequence length (eg 200) Note similarity of fields to PSSMs q = # of residue types (eg 20) A G A A R G I V F A A R A A F A Potts parameters interpretated as energy contributions from each position/pair
37 Form of the Potts Model Sequence landscape Couplings A G A A R G I V F A A R A A F A Potts Energy Fields Statistical Potts Energy Prevalence Statistical Energy Sequence Prevalence Given known values for the fields and couplings: Image: Dill P(S) gives us a probability for any sequence Can compute other statistics E(S) gives us a statistical energy landscape Can model effect of mutations, with epistasis + compensation Coupling values give us info about direct interactions between positions (without indirect interactions)
38 Parameterizing the Model given an MSA Above we found the functional form of the Maximum Entropy distribution, but we did not discuss how to find the values of the parameters This is actually a challenging task. We need to find the set of values which satisfly the constraints on the marginals, but there is no obvious way to do so Non-trivial function of Potts parameters
39 Parameterizing the Model given an MSA A number of different numerical methods and approximations have been developed to find the parameters: Belief Propagation, Susceptibility Propagation Mean Field inference Pseudolikelihood Methods + Conjugate Gradient Descent Cluster Expansion Monte Carlo + Quasi-Newton Optimization This is a computationally intensive task.
40 Parameterizing the Model given an MSA Flavor of the algorithms: Problem can be framed as a Maximum Likelihood inference Define a Likelihood function which has a maximum when the constraints are satisfied. (probability of the MSA according to model) Conjugate Gradient methods, Quasi-Newtons methods: Start with an initial guess for the Parameters Compute local gradient of the Likelihood Take a small step in that direction (update parameters) Repeat
41 Aside: Correction for Phylogeny and Sampling Biases Sequences may be phylogenetically related we may have a biased sample This may give the appearance of correlations even when there are none Eg: wild type: Single mutants Double mutant AAAAAAAAAAAAAAAAAA AAAAABAAAAAAAAAAAA AAAAAAAAAABAAAAAAA AAAAABAAAABAAAAAAA If we oversample dbl mut, overestimate correlation One solution: Weight each sequence by how many sequences are similar to it: (weighted average) weight Effective # of seqs
42 Parameterizing the Model given an MSA Summary of Inference Procedure 1)Obtain an MSA (eg from Pfam) 2)Apply phylogenetic weighting. Need > 1000 effective sequences for precise marginals 3)Compute the bivariate (and univariate) marginals of the data 4)Perform Parameter inference (eg Gradient Descent) given bivariate marginals End up with a set of parameters in a number of ways. which we can use
43 Potts Model Applications Contact Prediction Mutant stability Contact maps Ab-initio Structure Prediction Free Energy (Conformational) Landscapes Fitness landscapes Melting temperature Seq. Prevalence Electrostatic coupling Structure prediction Viral fitness Enzyme fitness Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness Levy, Haldane, Flynn. COSB 2017
44 Application 1: Contact Prediction Want to get an interaction score (like MI or SCA) but using Potts model Want to summarize the coupling values for each position pair (sum/average over ) Frobenius Norm of Couplings: APC Correction: (removes 'background') Direct Information Similar to MI, but will exclude indirect interactions since it is computed using direct couplings (some technical details related to gauges not discussed here)
45 Application 1: Contact Prediction Contact Map from Potts Model Contact Map from PDB structures (Protein-Kinase domain) Can achieve 80% True Positive rate for top 200 contacts.
46 Application 1: Contact Prediction Direct Interactions Non-interacting DI gives many more True Positives (red) than MI Indirect Interactions DI distinguishes direct from indirect interactions, MI does not Identification of direct residue contacts in protein protein interaction by message passing Weigt, White, Szurmant, Hoch, Hwa. PNAS 2009
47 Application 1: Structure Prediction Idea: Use predicted contacts as input to further algorithms: NRM (distance geometry: contact map structure) Go Models (coarse grained MD) Genomics-aided structure prediction Sułkowska, Morcos, Weigt, Hwa, Onuchic. PNAS 2012
48 Application 2: Free Energy and Conformational Landscapes Potts Energy E(S) reflects experimental mutantstability measurements and melting temperatures Mutant stability Melting temperature Biased MD/Go simulations using contacts as bias/constraints can uncover conformational landscape Quantification of the effect of mutations using a global probability model of natural sequence variation Hopf, Ingraham, Poelwijk, Springer, Sander, Marks Oct 2015 Coevolutionary signals across protein lineages help capture multiple protein conformations Morcos, Jana, Hwa, Onuchic. PNAS 2013 Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection Morcos, Schafer, Cheng, Onuchic, Wolynes. PNAS 2014
49 Application 3: Fitness Landscapes Enzyme fitness Viral fitness Potts energy E(S) reflects fitness of sequences and mutants Potts model can describe epistatic effects and compensatory mutations Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1 Molecular Biology and Evolution 2015 The Fitness Landscape of HIV-1 Gag: Advanced Modeling Approaches and Validation of Model Predictions by in vitro Testing PLoS Comput Biol 2014
The Evolutionary Origins of Protein Sequence Variation
Temple University Structural Bioinformatics II The Evolutionary Origins of Protein Sequence Variation Protein Evolution (tour of concepts & current ideas) Protein Fitness, Marginal Stability, Compensatory
More informationInfluence of Multiple Sequence Alignment Depth on Potts Statistical Models of Protein Covariation
Influence of Multiple Sequence Alignment Depth on Potts Statistical Models of Protein Covariation Allan Haldane Center for Biophysics and Computational Biology, Department of Physics, and Institute for
More informationSupplementing information theory with opposite polarity of amino acids for protein contact prediction
Supplementing information theory with opposite polarity of amino acids for protein contact prediction Yancy Liao 1, Jeremy Selengut 1 1 Department of Computer Science, University of Maryland - College
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationProcesses of Evolution
15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationSTRUCTURAL BIOINFORMATICS II. Spring 2018
STRUCTURAL BIOINFORMATICS II Spring 2018 Syllabus Course Number - Classification: Chemistry 5412 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Ronald Levy, SERC 718 (ronlevy@temple.edu)
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationQuantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12
Quantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12 The figure shows that the DCM when applied to the helix-coil transition, and solved
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More informationQ1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.
OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationSupporting Information
Supporting Information I. INFERRING THE ENERGY FUNCTION We downloaded a multiple sequence alignment (MSA) for the HIV-1 clade B Protease protein from the Los Alamos National Laboratory HIV database (http://www.hiv.lanl.gov).
More informationPhylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline
Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying
More informationStructural biomathematics: an overview of molecular simulations and protein structure prediction
: an overview of molecular simulations and protein structure prediction Figure: Parc de Recerca Biomèdica de Barcelona (PRBB). Contents 1 A Glance at Structural Biology 2 3 1 A Glance at Structural Biology
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationOutline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?
The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation By Jun Shimada and Eugine Shaknovich Bill Hawse Dr. Bahar Elisa Sandvik and Mehrdad Safavian Outline Background on protein
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationBiol478/ August
Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationThe neutral theory of molecular evolution
The neutral theory of molecular evolution Introduction I didn t make a big deal of it in what we just went over, but in deriving the Jukes-Cantor equation I used the phrase substitution rate instead of
More informationPhylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?
Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species
More informationGene and protein evolution
Topic Course Gene and protein evolution Lecture 5 Winter 2016 Department of Molecular Genetics University of Toronto Hue Sun Chan Synergy between the studies of protein biophysics and protein evolution
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationConcepts and Methods in Molecular Divergence Time Estimation
Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks
More informationLecture Notes: BIOL2007 Molecular Evolution
Lecture Notes: BIOL2007 Molecular Evolution Kanchon Dasmahapatra (k.dasmahapatra@ucl.ac.uk) Introduction By now we all are familiar and understand, or think we understand, how evolution works on traits
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationClassical Selection, Balancing Selection, and Neutral Mutations
Classical Selection, Balancing Selection, and Neutral Mutations Classical Selection Perspective of the Fate of Mutations All mutations are EITHER beneficial or deleterious o Beneficial mutations are selected
More informationMajor questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.
Evolutionary Genetics (for Encyclopedia of Biodiversity) Sergey Gavrilets Departments of Ecology and Evolutionary Biology and Mathematics, University of Tennessee, Knoxville, TN 37996-6 USA Evolutionary
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationProtein Folding Prof. Eugene Shakhnovich
Protein Folding Eugene Shakhnovich Department of Chemistry and Chemical Biology Harvard University 1 Proteins are folded on various scales As of now we know hundreds of thousands of sequences (Swissprot)
More informationFitness landscapes and seascapes
Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of
More informationChapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species.
AP Biology Chapter Packet 7- Evolution Name Chapter 22: Descent with Modification 1. BRIEFLY summarize the main points that Darwin made in The Origin of Species. 2. Define the following terms: a. Natural
More informationTHE EVOLUTION OF DUPLICATED GENES CONSIDERING PROTEIN STABILITY CONSTRAINTS
THE EVOLUTION OF DUPLICATED GENES CONSIDERING PROTEIN STABILITY CONSTRAINTS D.M. TAVERNA*, R.M. GOLDSTEIN* *Biophysics Research Division, Department of Chemistry, University of Michigan, Ann Arbor, MI
More informationMolecular Evolution & the Origin of Variation
Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants
More informationMolecular Evolution & the Origin of Variation
Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationComputational Biology From The Perspective Of A Physical Scientist
Computational Biology From The Perspective Of A Physical Scientist Dr. Arthur Dong PP1@TUM 26 November 2013 Bioinformatics Education Curriculum Math, Physics, Computer Science (Statistics and Programming)
More informationMany proteins spontaneously refold into native form in vitro with high fidelity and high speed.
Macromolecular Processes 20. Protein Folding Composed of 50 500 amino acids linked in 1D sequence by the polypeptide backbone The amino acid physical and chemical properties of the 20 amino acids dictate
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationPopulation Genetics I. Bio
Population Genetics I. Bio5488-2018 Don Conrad dconrad@genetics.wustl.edu Why study population genetics? Functional Inference Demographic inference: History of mankind is written in our DNA. We can learn
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationCompartmentalization detection
Compartmentalization detection Selene Zárate Date Viruses and compartmentalization Virus infection may establish itself in a variety of the different organs within the body and can form somewhat separate
More informationarxiv: v1 [q-bio.qm] 7 Aug 2017
HIGHER ORDER EPISTASIS AND FITNESS PEAKS KRISTINA CRONA AND MENGMING LUO arxiv:1708.02063v1 [q-bio.qm] 7 Aug 2017 ABSTRACT. We show that higher order epistasis has a substantial impact on evolutionary
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationUnit 7: Evolution Guided Reading Questions (80 pts total)
AP Biology Biology, Campbell and Reece, 10th Edition Adapted from chapter reading guides originally created by Lynn Miriello Name: Unit 7: Evolution Guided Reading Questions (80 pts total) Chapter 22 Descent
More informationProtein Mistranslation is Unlikely to Ease a Population s Transit across a Fitness Valley. Matt Weisberg May, 2012
Protein Mistranslation is Unlikely to Ease a Population s Transit across a Fitness Valley Matt Weisberg May, 2012 Abstract Recent research has shown that protein synthesis errors are much higher than previously
More informationUoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)
- Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationHaploid & diploid recombination and their evolutionary impact
Haploid & diploid recombination and their evolutionary impact W. Garrett Mitchener College of Charleston Mathematics Department MitchenerG@cofc.edu http://mitchenerg.people.cofc.edu Introduction The basis
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationLaying down deep roots: Molecular models of plant hormone signaling towards a detailed understanding of plant biology
Laying down deep roots: Molecular models of plant hormone signaling towards a detailed understanding of plant biology Alex Moffett Center for Biophysics and Quantitative Biology PI: Diwakar Shukla Department
More informationSta$s$cal Physics, Inference and Applica$ons to Biology
Sta$s$cal Physics, Inference and Applica$ons to Biology Physics Department, Ecole Normale Superieure, Paris, France. Simona Cocco Office:GH301 mail:cocco@lps.ens.fr Deriving Protein Structure and Func$on
More informationMutational effects and the evolution of new protein functions
Mutational effects and the evolution of new protein functions Misha Soskine and Dan S. Tawfik Abstract The divergence of new genes and proteins occurs through mutations that modulate protein function.
More informationCopyright 2000 N. AYDIN. All rights reserved. 1
Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment
More informationCHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection
CHAPTER 23 THE EVOLUTIONS OF POPULATIONS Section C: Genetic Variation, the Substrate for Natural Selection 1. Genetic variation occurs within and between populations 2. Mutation and sexual recombination
More informationOrthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under
More informationMultiple Sequence Alignment: HMMs and Other Approaches
Multiple Sequence Alignment: HMMs and Other Approaches Background Readings: Durbin et. al. Section 3.1, Ewens and Grant, Ch4. Wing-Kin Sung, Ch 6 Beerenwinkel N, Siebourg J. Statistics, probability, and
More informationExample of Function Prediction
Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little
More informationTHE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION
THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure
More informationChapter 7: Covalent Structure of Proteins. Voet & Voet: Pages ,
Chapter 7: Covalent Structure of Proteins Voet & Voet: Pages 163-164, 185-194 Slide 1 Structure & Function Function is best understood in terms of structure Four levels of structure that apply to proteins
More informationGenetic Drift in Human Evolution
Genetic Drift in Human Evolution (Part 2 of 2) 1 Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University Outline Introduction to genetic drift Modeling genetic drift
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationGene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009
Gene Genealogies Coalescence Theory Annabelle Haudry Glasgow, July 2009 What could tell a gene genealogy? How much diversity in the population? Has the demographic size of the population changed? How?
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationModule Contact: Dr Doug Yu, BIO Copyright of the University of East Anglia Version 1
UNIVERSITY OF EAST ANGLIA School of Biological Sciences Main Series UG Examination 2013-2014 EVOLUTIONARY BIOLOGY AND CONSERVATION GENETICS BIO-3C24 Time allowed: 3 hours Answer ALL questions in Section
More informationSession 5: Phylogenomics
Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More informationBustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #
Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationDarwinian Selection. Chapter 7 Selection I 12/5/14. v evolution vs. natural selection? v evolution. v natural selection
Chapter 7 Selection I Selection in Haploids Selection in Diploids Mutation-Selection Balance Darwinian Selection v evolution vs. natural selection? v evolution ² descent with modification ² change in allele
More informationSTEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)
STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu
More informationMultiple Whole Genome Alignment
Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationBioinformatics. Genotype -> Phenotype DNA. Jason H. Moore, Ph.D. GECCO 2007 Tutorial / Bioinformatics.
Bioinformatics Jason H. Moore, Ph.D. Frank Lane Research Scholar in Computational Genetics Associate Professor of Genetics Adjunct Associate Professor of Biological Sciences Adjunct Associate Professor
More informationNeutral Theory of Molecular Evolution
Neutral Theory of Molecular Evolution Kimura Nature (968) 7:64-66 King and Jukes Science (969) 64:788-798 (Non-Darwinian Evolution) Neutral Theory of Molecular Evolution Describes the source of variation
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationEndowed with an Extra Sense : Mathematics and Evolution
Endowed with an Extra Sense : Mathematics and Evolution Todd Parsons Laboratoire de Probabilités et Modèles Aléatoires - Université Pierre et Marie Curie Center for Interdisciplinary Research in Biology
More informationThe protein folding problem consists of two parts:
Energetics and kinetics of protein folding The protein folding problem consists of two parts: 1)Creating a stable, well-defined structure that is significantly more stable than all other possible structures.
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationEvolution of functionality in lattice proteins
Evolution of functionality in lattice proteins Paul D. Williams,* David D. Pollock, and Richard A. Goldstein* *Department of Chemistry, University of Michigan, Ann Arbor, MI, USA Department of Biological
More informationEVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION
Friday, July 27th, 11:00 EVOLUTIONARY DYNAMICS AND THE EVOLUTION OF MULTIPLAYER COOPERATION IN A SUBDIVIDED POPULATION Karan Pattni karanp@liverpool.ac.uk University of Liverpool Joint work with Prof.
More informationEvolution and Computation. Christos H. Papadimitriou The Simons Institute
Evolution and Computation Christos H. Papadimitriou The Simons Institute The Algorithm as a Lens It started with Alan Turing, 60 years ago Algorithmic thinking as a novel and productive point of view for
More informationGene regulation: From biophysics to evolutionary genetics
Gene regulation: From biophysics to evolutionary genetics Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen Johannes Berg Stana Willmann Curt Callan (Princeton)
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationWhy EvoSysBio? Combine the rigor from two powerful quantitative modeling traditions: Molecular Systems Biology. Evolutionary Biology
Why EvoSysBio? Combine the rigor from two powerful quantitative modeling traditions: Molecular Systems Biology rigorous models of molecules... in organisms Modeling Evolutionary Biology rigorous models
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More information(Write your name on every page. One point will be deducted for every page without your name!)
POPULATION GENETICS AND MICROEVOLUTIONARY THEORY FINAL EXAMINATION (Write your name on every page. One point will be deducted for every page without your name!) 1. Briefly define (5 points each): a) Average
More informationWright-Fisher Models, Approximations, and Minimum Increments of Evolution
Wright-Fisher Models, Approximations, and Minimum Increments of Evolution William H. Press The University of Texas at Austin January 10, 2011 1 Introduction Wright-Fisher models [1] are idealized models
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationPopulation Genetics: a tutorial
: a tutorial Institute for Science and Technology Austria ThRaSh 2014 provides the basic mathematical foundation of evolutionary theory allows a better understanding of experiments allows the development
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein
More information