Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky

Similar documents
Natural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Aoife McLysaght Dept. of Genetics Trinity College Dublin

In previous lecture. Shannon s information measure x. Intuitive notion: H = number of required yes/no questions.

Genetic Code, Attributive Mappings and Stochastic Matrices

Using an Artificial Regulatory Network to Investigate Neural Computation

Lecture IV A. Shannon s theory of noisy channels and molecular codes

Reducing Redundancy of Codons through Total Graph

Edinburgh Research Explorer

A p-adic Model of DNA Sequence and Genetic Code 1

Biology 155 Practice FINAL EXAM

A modular Fibonacci sequence in proteins

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level

Genetic code on the dyadic plane

The degeneracy of the genetic code and Hadamard matrices. Sergey V. Petoukhov

Lect. 19. Natural Selection I. 4 April 2017 EEB 2245, C. Simon

Get started on your Cornell notes right away

Mathematics of Bioinformatics ---Theory, Practice, and Applications (Part II)

A Minimum Principle in Codon-Anticodon Interaction

CHEMISTRY 9701/42 Paper 4 Structured Questions May/June hours Candidates answer on the Question Paper. Additional Materials: Data Booklet

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

A Mathematical Model of the Genetic Code, the Origin of Protein Coding, and the Ribosome as a Dynamical Molecular Machine

PROTEIN SYNTHESIS INTRO

THE GENETIC CODE INVARIANCE: WHEN EULER AND FIBONACCI MEET

ATTRIBUTIVE CONCEPTION OF GENETIC CODE, ITS BI-PERIODIC TABLES AND PROBLEM OF UNIFICATION BASES OF BIOLOGICAL LANGUAGES *

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Three-Dimensional Algebraic Models of the trna Code and 12 Graphs for Representing the Amino Acids

The genetic code, 8-dimensional hypercomplex numbers and dyadic shifts. Sergey V. Petoukhov

Analysis of Codon Usage Bias of Delta 6 Fatty Acid Elongase Gene in Pyramimonas cordata isolate CS-140

LIFE SCIENCE CHAPTER 5 & 6 FLASHCARDS

Full file at CHAPTER 2 Genetics

Practical Bioinformatics

Interphase & Cell Division

CODING A LIFE FULL OF ERRORS

Slide 1 / 54. Gene Expression in Eukaryotic cells

Abstract Following Petoukhov and his collaborators we use two length n zero-one sequences, α and β,

Crystal Basis Model of the Genetic Code: Structure and Consequences

Notes Chapter 4 Cell Reproduction. That cell divided and becomes two, two become, four become eight, and so on.

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

The Genetic Code Degeneracy and the Amino Acids Chemical Composition are Connected

Notes Chapter 4 Cell Reproduction. That cell divided and becomes two, two become four, four become eight, and so on.

Objective 3.01 (DNA, RNA and Protein Synthesis)

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Ribosome kinetics and aa-trna competition determine rate and fidelity of peptide synthesis

Biology Semester 2 Final Review

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Lesson 4: Understanding Genetics

Q2 (4.6) Put the following in order from biggest to smallest: Gene DNA Cell Chromosome Nucleus. Q8 (Biology) (4.6)

Short Answers Worksheet Grade 6

Introduction to Genetics. Why do biological relatives resemble one another?

Foundations of biomaterials: Models of protein solvation

Science Unit Learning Summary

Lecture 22. Introduction to Genetic Algorithms

UNIT 5. Protein Synthesis 11/22/16

Guided Reading Chapter 1: The Science of Heredity

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Gene Finding Using Rt-pcr Tests

Unit 3 - Molecular Biology & Genetics - Review Packet

that does not happen during mitosis?

Texas Biology Standards Review. Houghton Mifflin Harcourt Publishing Company 26 A T

Define: Alleles. Define: Chromosome. In DNA and RNA, molecules called bases pair up in certain ways.

2. What is meiosis? The process of forming gametes (sperm and egg) 4. Where does meiosis take place? Ovaries- eggs and testicles- sperm

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

Chapter 8: Introduction to Evolutionary Computation

Cell Division: the process of copying and dividing entire cells The cell grows, prepares for division, and then divides to form new daughter cells.

In Silico Modelling and Analysis of Ribosome Kinetics and aa-trna Competition

EVOLUTION ALGEBRA Hartl-Clark and Ayala-Kiger

Cell Growth and Division

2013 Japan Student Services Origanization

Unit A: Biodiversity Science 9 Study Guide

Lecture 9 Evolutionary Computation: Genetic algorithms

Genetic Algorithms. Donald Richards Penn State University

Cover Requirements: Name of Unit Colored picture representing something in the unit

Name Block Date Final Exam Study Guide

2. What was the Avery-MacLeod-McCarty experiment and why was it significant? 3. What was the Hershey-Chase experiment and why was it significant?

Introduction to Molecular and Cell Biology

Round 1. Mitosis & Meiosis Inheritance (10 questions)

Unit 5- Concept 1 THE DNA DISCOVERY

Ohio Tutorials are designed specifically for the Ohio Learning Standards to prepare students for the Ohio State Tests and end-ofcourse

Computational Biology: Basics & Interesting Problems

Genetics Notes. Chromosomes and DNA 11/15/2012. Structures that contain DNA, look like worms, can be seen during mitosis = chromosomes.

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Molecular Evolution and Phylogenetic Analysis

From Gene to Protein

V. Evolutionary Computing. Read Flake, ch. 20. Genetic Algorithms. Part 5A: Genetic Algorithms 4/10/17. A. Genetic Algorithms

DO NOT OPEN THE EXAMINATION PAPER UNTIL YOU ARE TOLD BY THE SUPERVISOR TO BEGIN

Variation of Traits. genetic variation: the measure of the differences among individuals within a population

Biology 2018 Final Review. Miller and Levine

2. Overproduction: More species are produced than can possibly survive

Heredity Composite. Multiple Choice Identify the choice that best completes the statement or answers the question.

IV. Evolutionary Computing. Read Flake, ch. 20. Assumptions. Genetic Algorithms. Fitness-Biased Selection. Outline of Simplified GA

1

The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The

1. The number of births of new organisms 2. The number of deaths of existing organisms 3. The number of organisms that enter or leave the population

Unit 4 Review - Genetics. UNIT 4 Vocabulary topics: Cell Reproduction, Cell Cycle, Cell Division, Genetics

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

V. Evolutionary Computing. Read Flake, ch. 20. Assumptions. Genetic Algorithms. Fitness-Biased Selection. Outline of Simplified GA

Supplementary Information for

Transcription:

It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us. Charles Darwin

Natural Selection Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky Charles Darwin, 1859, The Origin of Species 3 key ingredients for adaptation by natural selection Exponential growth of populations Struggle for existence: Limited Capacity for any population Variable, heritable survival and reproduction The unity of life: all species have descended from other species Builds on Malthus, An Essay on the Principle of Population, 1798 Domestic breeding shows hereditary modification is possible Fitness is a characteristic of individuals Natural Selection operates on populations Fitness is defined only for a particular environment, environments always change because species form the selective environments of other species Is survival of the fittest a circular statement? Is natural selection an optimization process?

Natural Selection Natural selection is often slow, but arms races result in complex, wonderful, bizarre (and stupid) things can lead to cooperation (largely) based on the fitness of reproductive individuals Natural selection is not learned behavior passed on group selection (Dawkins: selection acts on genes & on individuals, not groups) Exceptions? There s a lot we don t know about evolution The role of symbiosis & cooperation The right definition of species Darwin did not have a mechanism that allowed for heritable, variable fitness Genes: strings of DNA that get transcribed to RNA, translated to proteins and expressed as phenotype A string of molecular symbols AACCGGTAGTCTATGCTAGTGGGGTTTTAATAAT is turned into a protein that makes your hair brown, curly or fall out when you re 30

Genetics Mendel: showed that genes exist by breeding pea plants genes exist as recessives and dominants, one copy from each parent Given dominant AA mom and recessive aa dad, offspring are all Aa, and look like mom Variation comes from combining genes from mom (BbCCddZz) and dad (bbccddzz) In 1953 Watson & Crick & Rosalind Franklin discover the molecular structure of DNA DNA the molecule that carries the heritable information Mutations, sex, crossing over in DNA provide the variation Every cell in your body has 30,000 bp of DNA that is transcribed into RNA and translated into proteins Proteins do all the work: Make your eyes blue, your hair curly, your muscles strong, your heart pump DNA is arranged into genes on chromosomes Humans have 23 chromosomes, 2 copies each (46) Fits by supercoiling: 2-3m DNA / cell, your DNA goes to moon and back 70 times!

What mechanisms allow for heritable, variable fitness? Heritable Genes: encoded in DNA, transcribed to RNA, translated to proteins whose expression determines fitness Variable Mutations--copies are not perfect Sex genes are combined from 2 parents Crossing over allows for many different possible combinations

DNA DNA = Deoxy-ribonucleic acid Unit: nucleotide Sugar ring with a base (A, T G, C) and phosphate group Base pairing A-T, G-C Every cell in your body has 30,000 bp of DNA that is transcribed into RNA and translated into proteins DNA is arranged into genes on chromosomes Humans have 23 chromosomes, 2 copies each Fits by supercoiling: 2-3m DNA / cell, your DNA goes to moon and back 70 times! Adenine, Thymine Cytosine, Guanine

The Central Dogma

RNA codon table 4 bases, 3 per codon = 4 3 codons = 64 codons 20 amino acids (redundancy is possible) This table shows the 64 codons and the amino acid each codon codes for. The direction is 5' to 3'. Ala/A GCU, GCC, GCA, GCG Leu/L UUA, UUG, CUU, CUC, CUA, CUG Arg/R CGU, CGC, CGA, CGG, AGA, AGG Lys/K AAA, AAG Asn/N AAU, AAC Met/M AUG Asp/D GAU, GAC Phe/F UUU, UUC Cys/C UGU, UGC Pro/P CCU, CCC, CCA, CCG Gln/Q CAA, CAG Ser/S UCU, UCC, UCA, UCG, AGU, AGC Glu/E GAA, GAG Thr/T ACU, ACC, ACA, ACG Gly/G GGU, GGC, GGA, GGG Trp/W UGG His/H CAU, CAC Tyr/Y UAU, UAC Ile/I AUU, AUC, AUA Val/V GUU, GUC, GUA, GUG START AUG STOP UAG, UGA, UAA

Strings of amino acids Proteins Primary, secondary and tertiary structure Proteins do all the work but 99% of human DNA is not translated into protein Why carry around all that junk Some is not expressed in some cells or conditions Some is evolutions play ground

Variation in DNA How can the genetic content of a strand of DNA change? Mutagens many types of direct mutations UV, particle radiation, oxygen radicals, other chemicals Sex (Mendelian genetics) Chromosomal crossing over during meiosis Gene exchange via gene transfer in bacteria Viral DNA insertion and exchange (viruses do not have cellular machinery to reproduce their genomes, so use ours mistakes happen) Many ways we don t understand

Crossing over Each cell has 2 copies of every gene, but sperm and eggs each have 1. The process of creating 1 from 2 is meiosis (with crossing over) In sexual reproduction Mom:AAACATCCGTTAA (tall, blue eyes, no toe hair) ----->AAACATTCCGGA ---> tall, brown eyes, hairy toes Dad: AGGCCTTCCGGAA (short, brown eyes, hairy toes) A new offspring is created by combining 1 chromosome from an egg and 1 from a sperm

Summary: Genetics & Natural Selection 3 key ingredients for adaptation by natural selection Exponential growth of populations Struggle for existence: Limited Capacity for any population Variable, heritable survival and reproduction Genetics: A discrete 4 letter alphabet (AGCT), packaged into genes, that code for proteins Variation and Heredity Letters can change: mutations, insertions, deletions Chromosomes crossover to create sperm & eggs Sperm and eggs combine to make new offspring Each cell has the same DNA In a tremendously complicated process DNA is transcribed into RNA and RNA is translated into proteins that cause phenotype

4 billion years ago A proto-bacteria made a copy of itself A long time (bacteria can reproduce in 20 minutes) A lot of individuals A very good (and inevitable) process Massively parallel search Partial solutions are conserved Arms races Molecules for storing info, processing info, doing work Result: You, me & billions of species Discussion: natural selection as a complex adaptive system

DNA:.ATG GCT GTT CAG TAG CGT.. RNA: AUG GCU GUU CAG UAG CGU Protein: Met Ala Val Gin Stop Arg

Key Concepts Discussion Introduction to The Origin of Species The Central Dogma

Genetic Algorithms Principles of natural selection applied to computation: Variation Selection Inheritance Evolution in a computer: Individuals (genotypes) stored in computer s memory as bit strings Evaluation of individuals (artificial selection) Differential reproduction through copying and deletion Variation introduced by analogy with mutation and crossover

Genetic Algorithms Initialize a population, P Repeat Create an empty population, P Select 2 individuals from P based on fitness Apply mutation, mating, crossover Add the individuals to P Set P = P P at T n P P at T n+1

Define the individuals (string of bits or letters, representation matters) Define a fitness function that evaluates the string Define the rules for selecting individuals (e.g. roulette or tournament) mutation (e.g., some % of bits flip) mating (usually 2 parents) cross over (e.g., probability of crossing over at each position; 1 point, 2 point, n point)

Fitness functions Raw fitness: f raw = % of correct bits, selective pressure decreases as answer gets closer Scaled fitness: 2 f raw One more correct letter is twice as fit (only works in simple cases) Normalized fitness: fitness divided by the average fitness in the population Selection Methods Fitness proportionate: roulette wheel probability of appearing in P is proportional to normalized fitness Tournaments: pick (usually) 2 individuals from P, compare fitness, put more fit individual in P. Sample with replacement. Elitism guarantee best x solutions will appear in P Implicit fitness in agent based models ability to reproduce determines fitness

Simple example: evolve a string Find the string Furious green ideas sweat profusely There are 27 35 possible strings GA: 500 strings, crossover rate 75%, mutation rate 1% Time, avg f, best f, Best string 0,.035,.20 pjrmrubynrksiidwctxfodkodjjzfunpk 1,.070,.26 pjrmrubynrksxiidnybvswcqo piisyexdt 26,.72,.80, qurmous green idnasvsweqt prifuseky 42,.90,.97, qurious green ideas sweat profusely 46.94, 1, curious green ideas sweat profusely Massively parallel directed search is effective when there is 1 correct answer.

How big a search has life conducted on earth? A combinatorial optimization problem (sort of) How many bacteria on earth: 10 x How many days would it take to produce that many bacteria from a single cell: 10 y How many bacteria could have been produced in 3.5 billion years, in an infinite world: 10 z x, y & z are integers between 1 and 1 quadrillion -Initial guesses from each group form P (an x,y,z triplet) -I will eliminate least fit guesses from P -If you remain in P, your next guess, is your last guess with up to 1 mutation & 2 crossings over with other guesses still in P -If you were eliminated your guess must be formed from up to 1 mutation and 2 crossings over from 2 members of P Pt=2 Pt=3, fitness P t=1 -Valid mutations: 30.500.600000 5K.1tril. 100tril, F=2 add/subtract a 1 or 300.1000.100000 10.1 mil. 1bil, F=7 multiply/divide by 10 251.50000.600000 1000,1mil. 1bil,F=5 (90 can go to 90, 9, 900, 89 or 91) 2510.5000.60000 10K.10mil.10bil F = 5 251.5000.60000 50.500.60000 2K.5K.10 F = -1 250.500.60,000 300.1000.60000 1000. 1 bil. 1 quad F=0 30.1000.60,000

Whitman et al PNAS 1998 estimate 5 10 30 bacteria on the planet Events that would occur once in 10 billion years in the laboratory would occur every second in nature. 1 x 10 3 bacterial generations per year (1 every 3 days) 3.5 x 10 9 years of evolution ~10 43 bacteria have lived on earth How long to produce 10 30 bacteria: 150 days~= 10 2 days How many bacteria could have been produced in 3.5 billion years? 3 trillion generations 2 3,000,000,000,000 bacteria = 10 900 billion bacteria

Reading for Monday Genetic Algorithms: Principles of Natural Selection Applied to Computation Stephanie Forrest, Science, Vol. 261, No. 5123. (Aug. 13, 1993), pp. 872-878. Grey codes, schema, function & combinatorial optimization, selecting good parameters/rules for your GA TALK ON SATURDAY, 530 pm, Hibben 105 Professor Steve Lansing (just south of the Anthropology Department) "Perfect Order: Recognizing Complexity in Bali

Guidelines for implementing GAs Define the individuals (string of bits or letters, representation matters) Define a fitness function that evaluates the string (explicitly or implicitly, e.g. in an ABM) Define the rules for selecting individuals (e.g. roulette or tournament, often with elitism) mutation (e.g., some % of bits flip) mating (usually 2 parents) cross over (e.g., probability of crossing over at each position; 1 point, 2 point, n point) Parameters (rough guidelines from DeJong 1975 GA experiments on a particular suite of problems): Bitstring length: 32-10,000 Population size: 100-1000 Length of run: 50-10,000 Single point crossover rate: 0.6 per pair of parents Mutation rate: 0.005 per bit

GA evolved GA parameters Grefenstette 1986 found smaller populations with more crossover & mutation maximized average fitness Population: 30 Crossover: 0.95 Mutation: 0.01 Elitism Schaffer Caruana, Eschelman & Das 1989 Bigger test set of numerical optimization problems, grey coding Population: 20-30 Crossover: 0.75-0.95 Mutation: 0.005-0.01

Evolving parameters over time Davis 1989, 1991: Let mutation & crossover rates evolve Allows values of operators that improve fitness to become more common in the population Operators have fitness based on fitness of individuals containing that operator or descended from such individuals An operator, O, is chosen proportional to its fitness to create a new individual, i, that replaces an unfit individual If F(i) > current F max, the fitness of O i is incremented. Fitness of O p(i), O p(p(i)) (parents & grandparents, etc. of i) are also incremented. Improves GA performance on some problems, including evolving weights on neural nets How well does the population of operators represent current usefulness? Depends on the parameters

Goldberg et al 1989, 1990, 1993 Messy GAs Improve function optimization by building long strings from well-tested building blocks Example simple function optimization GA F(x,y) = yx 2 -x 4 Represent x and y as 3 bit grey coded strings E.g., F(001111) = F(1,5) = 5*1 2-1 4 = 4 Problem: can t increase string length to keep building blocks Solution: encode position in the representation as (position, value) Individual above: {(1,0),(2,0),(3,1),(4,1),(5,1),(6,1)} Another individual: {(3,1),(3,0),(3,1),(4,0),(4,1),(3,1),(2,0),(1,1)} How do you evaluate the fitness of this individual?

Messy GAs cont. Another individual: {(3,1),(3,0),(3,1),(4,0),(4,1),(3,1),(2,0),(1,1)} 1. Use leftmost assignment for each position 2. Fill in unspecified bits with * Evaluated as {(1,1),(2,0),(3,1),(4,0),(5,*),(6,*)} 1010** (a schema) Now you have to evaluate the fitness of a schema F(1010**) Use Competitive Templates First use the GA to find a string S, that is a local optimum (for which n bit flips do not improve fitness) Second, use bits from S to fill in unspecified bits to evaluate: E.g. if S = 110010 evaluate F(101010)

2 Phases Messy GAs continued Primordial: create a population of small, fit simple strings List all schema with k specified bits, for a string length l. For k = 3, l = 6 {(1,0),(2,0),(3,0)} {(1,0),(2,0),(3,1)} {(1,1),(2,1),(3,1)} {(1,0),(2,0),(4,0)} {(4,1),(5,1),(6,1)} Use selection to evaluate fitness (w/o mutation or crossover) to cull population Juxtapositional: splice and dice the small strings in P E.g. splice first and 4 th strings above: {(1,0),(2,0),(4,0), (4,1),(5,1),(6,1)} E.g. cut to create {(1,0),(2,0)} and {(4,0), (4,1),(5,1),(6,1)} Evaluate fitness of these new strings

Problems with messy GAs The biggest problems are specifying k & evaluating all schemas with k specified bits: n = 2 k l!k The initialization bottleneck is evaluating all those schemas There are some methods to reduce the necessary evaluations, but This technique only works well when k is small, e.g. when low order schema are good enough to solve the posed problem

How complicated is the Biological GA? Genotype: 4 letter alphabet triplet codons amino acids (with redundancy) folds into protein with unpredictable structure & phenotype varies with cellular environment Variation by mutation, insertion, deletion: each copy has potential for small and large effect on phenotype Position of genes doesn t matter* but position of codons matters Genes interact in gene regulatory networks