Compartmentalization detection

Similar documents
Big Idea #1: The process of evolution drives the diversity and unity of life

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

FIG S1: Plot of rarefaction curves for Borrelia garinii Clp A allelic richness for questing Ixodes

Chapters AP Biology Objectives. Objectives: You should know...

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Processes of Evolution

Big Idea 1: The process of evolution drives the diversity and unity of life.

AP Curriculum Framework with Learning Objectives

Correlation Between the AP Biology Curriculum Framework and CAMPBELL BIOLOGY 9e AP* Edition

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

A (short) introduction to phylogenetics

Biology 11 UNIT 1: EVOLUTION LESSON 2: HOW EVOLUTION?? (MICRO-EVOLUTION AND POPULATIONS)

CONGEN Population structure and evolutionary histories

Dr. Amira A. AL-Hosary

Population Structure

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

How robust are the predictions of the W-F Model?

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Anatomy of a species tree

Phylogenetic Tree Reconstruction

It all depends on barriers that prevent members of two species from producing viable, fertile hybrids.

C3020 Molecular Evolution. Exercises #3: Phylogenetics

1. What is the definition of Evolution? a. Descent with modification b. Changes in the heritable traits present in a population over time c.

Supplementary Figures.

Summary Outline of Topics in Curriculum Framework. and CAMPBELL BIOLOGY 9e AP* Edition

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

CHAPTER 23 THE EVOLUTIONS OF POPULATIONS. Section C: Genetic Variation, the Substrate for Natural Selection

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

AP Biology Essential Knowledge Cards BIG IDEA 1

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Population Genetics I. Bio

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

MOLECULAR EVOLUTION AND PHYLOGENETICS SERGEI L KOSAKOVSKY POND CSE/BIMM/BENG 181 MAY 27, 2011

A A A A B B1

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

I of a gene sampled from a randomly mating popdation,

Effects of Gap Open and Gap Extension Penalties

Big Idea 3: Living systems store, retrieve, transmit, and respond to information essential to life processes.

Evolution AP Biology

Neutral Theory of Molecular Evolution

Estimating effective population size from samples of sequences: inefficiency of pairwise and segregating sites as compared to phylogenetic estimates

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

A. Incorrect! Form is a characteristic used in the morphological species concept.

The Wright-Fisher Model and Genetic Drift

Understanding relationship between homologous sequences

Evaluate evidence provided by data from many scientific disciplines to support biological evolution. [LO 1.9, SP 5.3]

Population genetics snippets for genepop

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

7. Tests for selection

Major questions of evolutionary genetics. Experimental tools of evolutionary genetics. Theoretical population genetics.

Phylogenetic inference

Essential knowledge 1.A.2: Natural selection

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Darwin s Observations & Conclusions The Struggle for Existence

Classical Selection, Balancing Selection, and Neutral Mutations

Evolutionary Genetics: Part 0.2 Introduction to Population genetics

Examples of Phylogenetic Reconstruction

Phylogenetic analyses. Kirsi Kostamo

Phylogeny. November 7, 2017

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

I. Short Answer Questions DO ALL QUESTIONS

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Tree Building Activity

Demography April 10, 2015

Detecting the correlated mutations based on selection pressure with CorMut

Phylogenetics: Building Phylogenetic Trees

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012

Phylogenetics in the Age of Genomics: Prospects and Challenges

Using Bioinformatics to Study Evolutionary Relationships Instructions

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Session 5: Phylogenomics

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Inference of Viral Evolutionary Rates from Molecular Sequences

Introduction to population genetics & evolution

Intraspecific gene genealogies: trees grafting into networks

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Constructing Evolutionary/Phylogenetic Trees

Measuring the temporal structure in serially sampled phylogenies

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Introduction to Natural Selection. Ryan Hernandez Tim O Connor

Surfing genes. On the fate of neutral mutations in a spreading population


A. Correct! Taxonomy is the science of classification. B. Incorrect! Taxonomy is the science of classification.

West Windsor-Plainsboro Regional School District AP Biology Grades 11-12

Evolutionary Tree Analysis. Overview

Chapter 17: Population Genetics and Speciation

AP Biology Curriculum Framework

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Perplexing Observations. Today: Thinking About Darwinian Evolution. We owe much of our understanding of EVOLUTION to CHARLES DARWIN.

Mapping QTL to a phylogenetic tree

EVOLUTIONARY DISTANCES

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Supplementary Materials for

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Potts Models and Protein Covariation. Allan Haldane Ron Levy Group

Transcription:

Compartmentalization detection Selene Zárate Date

Viruses and compartmentalization Virus infection may establish itself in a variety of the different organs within the body and can form somewhat separate viral populations, driven to adapt to their particular environments and subjected to different selective pressures Virus populations can become isolated from each other, if trafficking and gene flow between viral subpopulations is significantly restricted, then each subpopulation can become genetically distinct from others, i.e., compartmentalized.

Compartmentalization Compartmentalization has been defined in different ways: as genetic heterogeneity between subpopulations as the result of independent micro-evolution as the result of restricted viral gene flow as the presence of distinct but phylogenetically related genotypes. In HIV, compartmentalized viral populations have been shown to possess distinct phenotypic characteristics, such as cellular tropism, drug resistance, and level of pathogenesis.

How to determine population structure? A population is considered structured if: genetic drift is occurring in some of its subpopulations migration does not happen uniformly throughout the population mating is not random throughout the population. A population s structure affects the extent of genetic variation and its patterns of distribution.

Standard Genetic Parameters for Population Diversity Analysis Allele richness (A) Effective number of alleles [AE= 1/(1-HE)] Observed heterozygosity(h0) Expected heterozygosity(he) Fixation Index (FIS) Within-population gene diversity (HI) Mean within-population gene diversity (Hs) Total diversity (HT) Coefficient of gene differentiation among populations (GST)

Methods used to detect virus compartmentalization Distance-based: FST, nearest neighbor. Tree-based: Slatkin-Maddisson, Association Index, Correlation coeficients

Distance based methods

Wright s measure of population subdivision: F ST Compares the mean pairwise genetic distance between two sequences sampled from different compartments to the mean distance between sequences sampled from the same compartment. Statistical significance is derived via a population-structure randomization test.

F ST score When the differences between compartments is much larger than the differences within compartments, the values of F ST approaches 1. Therefore values of F ST close to 1 indicate compartmentalization

Distance within subpopulation = 14/36 = 0.39 Distance between subpopulation =53/81=0.65 F ST = (0.65-0.39 )/0.65 = 0.4

Nearest-neighbor statistic (S nn ) Is a measure of how often the nearest neighbors of each sequence were isolated from the same or different compartments. The distance between sequences is measured using the TN93 metric (not the number of sites in which two sequences differ, as in the original description). where Xj is 1 if the nearest neighbor was isolated from the same subpopulation or 0 otherwise.

A Sequence A was isolated from one subpopulation, 6 of its nearest neighbors are from the same population and 2 are not, its contribution to Snn is 6/9 Snn= (14(6/9) + 4 (1/9)) /18 = 0.54

Tree-based methods

Slatkin-Maddison (SM) Determines the minimum number of migration events between the separated populations consistent with the structure of the reconstructed phylogenetic tree. Statistical support is based on the number of migration events that would be expected in a randomly structured population, derived by permuting sequences between compartments.

In the phylogeny shown here, one migration event explains the distribution of the sequences in the topology. The more migration events needed to explain the distribution of sequences the compartmentalization hypothesis becomes less likely

Simmonds association index (AI) Assesses the degree of population structure in the phylogenetic tree by weighting the contribution of each internal node based on its depth in the tree (progressively less for nodes near the root) and evaluating the significance of the observed value using a bootstrap sample both over the structure of the population and the shape of the phylogenetic tree.

d = (1-0.5)/2 2-1 d = (1-0.66)/2 3-1 At each node determine the number of sequences below it (n), and the frequency of the most frequent variant (f), and calculate d = (1-f)/2 n-1. The AI is calculated as the ratio between the mean score of 100 bootstrap replicates, and the mean of 10 sample reassigned controls The smaller the score, the more likely the population is compartmentalized

Correlation coefficients (r, r b ) Correlation coefficients are a way to correlate distances between two sequences in a phylogenetic tree with the information about whether or not they were isolated from the same compartment. The distance between two sequences can be either the number of tree branches separating the sequences (r b ) or the cumulative genetic distance between the sequences (r). To assess whether the computed coefficient was statistically significant, we estimated the distribution of these coefficients by permuting sequences between compartments. A P value of 0.05 or less was considered statistically significant.

How these methods compare p0 = f yy + f nn pe = (f yy + f yn )*(f yy + f ny ) + (f nn + f ny )*(f nn + f yn ) κ = (p0 - pe)/(1 - pe) Zárate 2007. J. Virol: 81(12)

Biased sample sizes Zárate 2007. J. Virol: 81(12)

A tour around HYPHY

A quick example: Lets take two sets of data, one group of HIV sequences* derived from either plasma or female genital tract (A), and a second data set +, with samples derived from plasma or CSF (B). We aligned the sequences and reconstructed the phylogeny in order to carry out the compartmentalization analysis *Kemal, PNAS:100(22). + Gatanaga, Arch. Virol. :144(1).

Starting an analysis Open the Standard analysis menu and select Compartmentalization. We will start with an FST analysis. You will need: A sequence alignment A distinct label in the sequence name for each compartment A substitution model

F ST and S nn Population characteristics: Metapopulation diversity (pi_t) Mean subpopulation diversity (pi_s) Mean interpopulation diversity (pi_b) FST Hudson, Slatkin and Madison (Genetics 132:583-589) Slatkin (Evolution 47:264-279) Hudson, Boos and Kaplan (Mol Bio Evol 9: 138-151) Hudson (S_nn) (Genetics 155:2011-14):

Data set A Data set B

Data set A Data set B

Data set A Data set B

Data set A

Data set A Data set B

Data set A F ST = 0.84 Data set B F ST = 0.13

Slatkin-Maddison test You will need a phylogeny to carry out this analysis

Data set A

Data set A Data set B

Data set A Data set B

Association Index You will need an aligment that includes a sequence that can be cosidered as an outgroup

Association Index You will need an aligment that includes a sequence that can be cosidered as an outgroup

Association Index Data set A Data set B

Data set A Data set B

Correlation coefficients You will need to load a phylogeny to carry out this analysis.

Correlation Coefficients Data set A Data set B

Comparison between methods F ST S nn SM AI r r b Data set A Data set B

Exercises

Exercises Follow the instructions to determine if sequences from patients C, D. K, Q and S show evidence of compartmentalization

Results C D K Q S FST (HSM) 0.074 p = 0.066-0.033 p = 0.754 0.57 p =0.12 0.83 p =0 0.76 p = 0 FST (S) 0.039 p= 0.066-0.016 p= 0.754 0.3 p= 0.12 0.70 p=0 0.62 p = 0 FST (HBK) 0.039 p= 0.066-0.016 p = 0.754 0.3 p = 0.12 0.70 p=0 0.62 p = 0 Snn 0.64 p = 0.11 0.32 p = 0.952 0.72 p =0.013 0.96 p=0 1 p = 0 AI 0.71 boot = 85 1.38 boot = 1 0.63 boot = 89 0.23 boot = 100 1.2 x 10-8 boot = 100 SM 8 migrations p = 0.452 12 migrations p =0.974 9 migrations p= 0.46 2 migrations p = 0 1 migration p = 0 r -0.0012 P < 0.27-0.04 p < 0.91 0.1 p < 0.026 0.83 p < 0.00099 0.95 p < 0.00099 rb 0.021 p < 0.42-0.045 p < 0.93 0.14 p < 0.037 0.4729 p < 0.00099 0.69 p < 0.00099

Patient data C D K Q S Current CD4 (cells/mm 3 ) Plasma RNA (log copies/ml) CSF RNA (log copies/ml) CSF WBC (cells/mm 3 ) 312 55 221 68 32 5.7 5.9 5.7 5.1 6 5.2 4 4.4 3.5 3.2 312 2 16 2 3 Compartmentalized