Bioinformatics
What can sequences tell us? AGACCTGAGATAACCGATAC By themselves? Not a heck of a lot...* *Indeed, one of the key results learned from the Human Genome Project is that disease is much more complicated than a simple appeal to genomebased therapeutics as was originally promised However, through comparison and analysis, combined with molecular and structural biology, they can reveal vast amounts of evolutionary information hidden away within them (Francis Crick, the less vocal eugenics advocate of the pair) annotated sequence of human X chromosome
How to compare sequences ith position of sequence 1,2 S tot (σ i,σ i)= N S i (σ i,σ i) i scoring function replacement frequency S ij = log p ij q i q j amino-acid frequency Empirically derived based on real sequences BLOSUM62 scoring matrix PBoC 21.2.2
Phylogenetic analysis hemoglobin sequences phylogenetic tree sequence similarity can be used to trace ancestral lineages PBoC 21.4.1
Tree of life -based on 16S rrna taxonomy -demonstrates most diversity is in the microbes -first proof of archaea as a separate evolutionary domain (only accepted a decade after first published!) -determined by Carl Woese, who was dubbed Microbiology's Scarred Revolutionary Woese, C. R.; G. E. Fox (1977-11-01). "Phylogenetic structure of the prokaryotic domain: The primary kingdoms".pnas 74 (11): 5088 5090.
Neutral mutations and evolution frequency of neutral amino acid (and codon redundant) mutations also agree with expectations presence or absence of retroviral sequences inserted into DNA match phylogenetic tree (similar for transposons) Chimp 2p all great apes have 24 pairs of chromosomes, while humans have 23 pairs genetic analysis shows human chromosome 2 resulted from ancestral fusion of two chromosomes Chimp 2q Human 2
From sequence to structure sequences of actin-like proteins in bacteria (MreB, ParM) and eukaryotes (actin) - almost ZERO similarity...yet the structures look nearly identical! sequence conservation structure conservation (BUT NOT THE CONVERSE) PBoC 21.2
How to compare structures Simple RMSD no longer works when sequence lengths differ QH scores alignment based on residue-residue distances AND gaps (no sequence information!) Sequence-based and structure-based phylogenetic trees are in agreement structure encodes evolutionary information as well!!! Patrick O'Donoghue, Zaida Luthey-Schulten, Evolutionary Profiles Derived from the QR Factorization of Multiple Structural Alignments Gives an Economy of Information, (2005) JMB, 346: 875-894,
Molecular paleontology ancient protein sequences can be reconstructed via phylogenetic analysis (NOT the same as Jurassic Park, but close!) absorbance spectra of dinosaur rhodopsin demonstrates what it could see PBoC 21.4.1 Eric Gaucher at GT structurally characterized 3-4 billion-year-old versions of a antibiotic-resistance protein (www.gauchergroup.biology.gatech.edu/) Beta-lactamase (modern, ancestor 1, ancestor 2) Valeria A. Risso et al. 2013. Hyperstability and Substrate Promiscuity in Laboratory Resurrections of Precambrian β- Lactamases. J. Am. Chem. Soc., 135 (8), pp. 2899 2902
Horizontal gene transfer genes are shared horizontally between species instead of solely vertically common (even dominant?) among bacteria; can lead to, e.g., extremely fast spread of antibiotic resistance genes even eukaryotes may acquire some genes horizontally, including entire cells (mitochondria and chloroplasts) complicates attempts to draw a universal tree of life with a unique common ancestor (but does not erase it completely!)
Human accelerated regions If chimps and humans share > 98% of our DNA, where are the important differences? In the so-called Human Accelerated Regions (HARs) ~200 identified so far, mostly in non-coding regions, NOT genes for proteins For example, HAR1, the most accelerated region, codes for a novel RNA gene expressed during neocortical development codes it s all about regulation! Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, Kern AD, Dehay C, Igel H, Ares M Jr, Vanderhaeghen P, Haussler D (2006). "An RNA gene expressed during cortical development evolved rapidly in humans". Nature 443: 167 172.
How to sequence DNA? shotgun sequencing multiple copies of genome are broken up into fragments of 2-10k bases PCR-like method can read 0.5-1k bases of each fragment (from both directions) random short sequences examined for overlaps and computationally reassembled into one long sequence Human Genome Project (public) used hierarchical shotgun, where libraries of 100-300k bases were first created and then shotgun-sequenced Celera Genomics project (private) used whole-genome shotgun sequencing
Next (and next)-gen sequencing methods Human Genome Project cost $2.7 billion and took 10 years goal for personalized medicine is (was) $1000 challenge now met, new goal is < $100 One promising technique: nanopore sequencing, e.g., using alpha-hemolysin (left) or MsbA (above) computational modeling/simulation necessary to interpret experiments (group of Alek Aksimentiev, UIUC)
Epigenetics - beyond sequence alone Modifications to DNA other than sequence changes can also influence expression in some cases are even heritable (some definitions include heritability as a requirement)
Epigenetics DNA methylation a key example of epigenetic control same genes, different tail kink hypermethylation involved in some cancers methylation can both increase and decrease stability of DNA strands depending on spacing, frequency Recognition of methylated DNA through methyl- CpG binding domain proteins Xueqing Zou, Wen Ma, Ilia Solov'yov, Christophe Chipot and Klaus Schulten Nucleic Acids Research, 40:2747-2758,2012