University of Dundee. Published in: Molecular Biology and Evolution. DOI: /molbev/msw235. Publication date: 2016

Similar documents
An Equivalent Source Method for Modelling the Lithospheric Magnetic Field Using Satellite and Airborne Magnetic Data

Aalborg Universitet. Lecture 14 - Introduction to experimental work Kramer, Morten Mejlhede. Publication date: 2015

A Note on Powers in Finite Fields

Sound absorption properties of a perforated plate and membrane ceiling element Nilsson, Anders C.; Rasmussen, Birgit

Modelling Salmonella transfer during grinding of meat

Optimizing Reliability using BECAS - an Open-Source Cross Section Analysis Tool

Lidar calibration What s the problem?

Reducing Uncertainty of Near-shore wind resource Estimates (RUNE) using wind lidars and mesoscale models

On a set of diophantine equations

Matrix representation of a Neural Network

The dipole moment of a wall-charged void in a bulk dielectric

Numerical modeling of the conduction and radiation heating in precision glass moulding

Global Wind Atlas validation and uncertainty

The structure of a nucleolytic ribozyme that employs a catalytic metal ion Liu, Yijin; Wilson, Timothy; Lilley, David

Graphs with given diameter maximizing the spectral radius van Dam, Edwin

A note on non-periodic tilings of the plane

Accelerating interior point methods with GPUs for smart grid systems

Multi (building)physics modeling

Validation of Boundary Layer Winds from WRF Mesoscale Forecasts over Denmark

The M/G/1 FIFO queue with several customer classes

Tectonic Thinking Beim, Anne; Bech-Danielsen, Claus; Bundgaard, Charlotte; Madsen, Ulrik Stylsvig

The Wien Bridge Oscillator Family

Resistivity survey at Stora Uppåkra, Sweden

Kommerciel arbejde med WAsP

A beam theory fracture mechanics approach for strength analysis of beams with a hole

GIS Readiness Survey 2014 Schrøder, Anne Lise; Hvingel, Line Træholt; Hansen, Henning Sten; Skovdal, Jesper Høi

Radioactivity in the Risø District July-December 2013

Radioactivity in the Risø District January-June 2014

COMPARISON OF THERMAL HAND MODELS AND MEASUREMENT SYSTEMS

Tilburg University. Strongly Regular Graphs with Maximal Energy Haemers, W. H. Publication date: Link to publication

Aalborg Universitet. Land-use modelling by cellular automata Hansen, Henning Sten. Published in: NORDGI : Nordic Geographic Information

Polydiagnostic study on a surfatron plasma at atmospheric pressure

Aalborg Universitet. CLIMA proceedings of the 12th REHVA World Congress Heiselberg, Per Kvols. Publication date: 2016

Measurements of the angular distribution of diffuse irradiance

Impact of the interfaces for wind and wave modeling - interpretation using COAWST, SAR and point measurements

On numerical evaluation of two-dimensional phase integrals

Aalborg Universitet. Published in: IEEE International Conference on Acoustics, Speech, and Signal Processing. Creative Commons License Unspecified

Design of Robust AMB Controllers for Rotors Subjected to Varying and Uncertain Seal Forces

On the Einstein-Stern model of rotational heat capacities

Analytical Studies of the Influence of Regional Groundwater Flow by on the Performance of Borehole Heat Exchangers

Design possibilities for impact noise insulation in lightweight floors A parameter study

Published in: Proceedings of the 21st Symposium on Mathematical Theory of Networks and Systems

Reflection of sound from finite-size plane and curved surfaces

Steady State Comparisons HAWC2 v12.2 vs HAWCStab2 v2.12

Comparisons of winds from satellite SAR, WRF and SCADA to characterize coastal gradients and wind farm wake effects at Anholt wind farm

Resistivity imaging for mapping of quick clays

Dynamic Effects of Diabatization in Distillation Columns

Thermodynamics of monomer partitioning in polymer latices: effect of molar volume of the monomers

Citation for published version (APA): Burcharth, H. F. (1988). Stress Analysis. Aalborg: Department of Civil Engineering, Aalborg University.

Development of a Dainish infrastructure for spatial information (DAISI) Brande-Lavridsen, Hanne; Jensen, Bent Hulegaard

Characterization of microporous silica-based membranes by calorimetric analysis Boffa, Vittorio; Yue, Yuanzheng

Geometry explains the large difference in the elastic properties of fcc and hcp crystals of hard spheres Sushko, N.; van der Schoot, P.P.A.M.

Simulating wind energy resources with mesoscale models: Intercomparison of stateof-the-art

Validation and comparison of numerical wind atlas methods: the South African example

Prediction of airfoil performance at high Reynolds numbers.

Diffusion of sulfuric acid in protective organic coatings

Upper and Lower Bounds of Frequency Interval Gramians for a Class of Perturbed Linear Systems Shaker, Hamid Reza

Exposure Buildup Factors for Heavy Metal Oxide Glass: A Radiation Shield

Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Realtime 3D stress measurement in curing epoxy packaging

Estimation of Peak Wave Stresses in Slender Complex Concrete Armor Units

Transmuted distributions and extrema of random number of variables

Sorption isotherms for textile fabrics and foam used in the indoor environment

Berøringsløs optisk måling af gassammensætning ved UV og IR spektroskopi

Optimal acid digestion for multi-element analysis of different waste matrices

The calculation of stable cutting conditions in turning

Application of mesoscale models with wind farm parametrisations in EERA-DTOC

Reflection and absorption coefficients for use in room acoustic simulations

Citation for pulished version (APA): Henning, M. A., & Yeo, A. (2016). Transversals in 4-uniform hypergraphs. Journal of Combinatorics, 23(3).

Unknown input observer based scheme for detecting faults in a wind turbine converter Odgaard, Peter Fogh; Stoustrup, Jakob

Determination of packaging induced 3D stress utilizing a piezocoefficient mapping device

Online algorithms for parallel job scheduling and strip packing Hurink, J.L.; Paulus, J.J.

Notes on cooperative research in the measurement of Gottwein-temperature Veenstra, P.C.

Benchmarking of optimization methods for topology optimization problems

Application of the wind atlas for Egypt

The IEA Annex 20 Two-Dimensional Benchmark Test for CFD Predictions

A Blue Lagoon Function

Strongly 2-connected orientations of graphs

Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Minimum analysis time in capillary gas chromatography. Vacuum- versus atmospheric-outlet column operation Leclercq, P.A.; Cramers, C.A.M.G.

Potential solution for rain erosion of wind turbine blades

HIF-1 can act as a tumor suppressor gene in murine Acute Myeloid Leukemia.

Published in: Proceedings of IECON 2016: 42nd Annual Conference of the IEEE Industrial Electronics Society

Aalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes

Generating the optimal magnetic field for magnetic refrigeration

On Estimation of the CES Production Function - Revisited

Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers)

A comparison study of the two-bladed partial pitch turbine during normal operation and an extreme gust conditions

On the Unique D1 Equilibrium in the Stackelberg Model with Asymmetric Information Janssen, M.C.W.; Maasland, E.

On machine measurements of electrode wear in micro EDM milling

Room Allocation Optimisation at the Technical University of Denmark

Influence of the shape of surgical lights on the disturbance of the airflow Zoon, W.A.C.; van der Heijden, M.G.M.; Hensen, J.L.M.; Loomans, M.G.L.C.

Published in: Proceeding of 9th SENVAR / 2nd ISESEE, Shah Alam, Malaysia, December 2008

Multiple Scenario Inversion of Reflection Seismic Prestack Data

Spatial decay rate of speech in open plan offices: the use of D2,S and Lp,A,S,4m as building requirements Wenmaekers, R.H.C.; Hak, C.C.J.M.

Crack Tip Flipping: A New Phenomenon yet to be Resolved in Ductile Plate Tearing

A systematic methodology for controller tuning in wastewater treatment plants

The colpitts oscillator family

Transcription:

University of Dundee Progressive and biased divergent evolution underpins the origin and diversification of peridinin dinoflagellate plastids Dorrell, Richard G.; Klinger, Christen M.; Newby, Robert J.; Butterfield, Erin; Richardson, Elisabeth; Dacks, Joel B.; Howe, Christopher J.; Nisbet, R. Ellen R.; Bowler, Chris Published in: Molecular Biology and Evolution DOI: 10.1093/molbev/msw235 Publication date: 2016 Document Version Accepted author manuscript Link to publication in Discovery Research Portal Citation for published version APA): Dorrell, R. G., Klinger, C. M., Newby, R. J., Butterfield, E. R., Richardson, E., Dacks, J. B.,... Bowler, C. 2016). Progressive and biased divergent evolution underpins the origin and diversification of peridinin dinoflagellate plastids. Molecular Biology and Evolution, 342), 361-379. DOI: 10.1093/molbev/msw235 General rights Copyright and moral rights for the publications made accessible in Discovery Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from Discovery Research Portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain. You may freely distribute the URL identifying the publication in the public portal. Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

MBE Advance Access published November 4, 2016 Articles:Discoveries Progressiveandbiaseddivergentevolutionunderpinstheoriginanddiversificationof peridinindinoflagellateplastids RichardG.Dorrell 1+,ChristenM.Klinger 2,RobertJ.Newby 3,ErinR.Butterfield 4,5,Elisabeth Richardson 2,JoelB.Dacks 2,ChristopherJ.Howe 6,R.EllenR.Nisbet 6,andChrisBowler 1 1EcoleNormaleSupérieure,PSLResearchUniversity,InstitutdeBiologiedel EcoleNormale SupérieureIBENS),CNRSUMR8197,INSERMU1024,46rued Ulm,F[75005Paris,France 2DepartmentofCellBiology,UniversityofAlberta 3DepartmentofBiology,MiddleTennesseeStateUniversity 4DepartmentofBiochemistry,PennsylvaniaStateUniversity 5SchoolofLifeSciences,UniversityofDundee 6DepartmentofBiochemistry,UniversityofCambridge Contributedequallytothismanuscript +Towhomcorrespondenceshouldbeaddressed:dorrell@biologie.ens.fr any medium, provided the original work is properly cited. For commercial re-use, please contact The Authors) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in journals.permissions@oup.com Downloaded from http://mbe.oxfordjournals.org/ at University of Dundee on November 11, 2016! 1!

Abstract Dinoflagellatesarealgaeoftremendousimportancetoecosystemsandtopublichealth. Thecellbiologyandgenomeorganisationofdinoflagellatespeciesishighlyunusual.For example,theplastidgenomesofperidinin?containingdinoflagellatesencodeonlya minimalnumberofgenesarrangedonsmallelementstermed minicircles.previous studiesofperidininplastidgeneshavefoundevidencefordivergentsequenceevolution, includingextensivesubstitutions,novelinsertionsanddeletions,anduseofalternative translationinitiationcodons.understandingtheextentofthisdivergentevolutionhas beenhamperedbythelackofcharacterisedperidininplastidsequences.wehave identifiedover300previouslyunannotatedperidininplastidmrnasfrompublished transcriptomeprojects,vastlyincreasingthenumberofsequencesavailable.usingthese data,wehaveproducedawell?resolvedphylogenyofperidininplastidlineages,which uncoversseveralnovelrelationshipswithinthedinoflagellates.thisenablesustodefine changestoplastidsequencesthatoccurredearlyindinoflagellateevolution,andthathave contributedtothesubsequentdiversificationofindividualdinoflagellateclades.wefind thattheoriginoftheperidinindinoflagellateswasspecificallyaccompaniedbyelevations bothintheoverallnumberofsubstitutionsthatoccurredonplastidsequences,andinthe Ka/Ksratioassociatedwithplastidsequences,consistentwithchangesinselective pressure.thesesubstitutions,alongsideotherchanges,haveaccumulatedprogressively inindividualperidininplastidlineages.throughoutourentiredataset,weidentifya persistentbiastowardsnon?synonymoussubstitutionsoccurringonsequencesencoding photosystemisubunitsandstromalregionsofperidininplastidproteins,whichmayhave underpinnedtheevolutionofthisunusualorganelle.! 2!

Introduction Dinoflagellatesareeukaryoteswithimmenseecologicalandevolutionaryimportance.They includephotosynthetic,heterotrophic,mixotrophic,andparasiticrepresentativesdorrell andhowe2015).thephotosyntheticspeciesareamajorcomponentofplankton communitiesinmarineandfreshwaterenvironmentsleterme,etal.;devargas,etal. 2015),andincludecausativeagentsofharmfulalgalbloomsProtoperidinium,-Ceratium) Hallegraeff2010;Hinder,etal.2012),symbiontsofcoralsSymbiodinium)andmarine protozoapelagodinium,-brandtodinium)siano,etal.2010;probert,etal.2014).thenon[ photosyntheticspeciesincludeparasitesofmarineinvertebrateshematodinium,- Syndiniales,andellobiopsids)Gornik,etal.2012),andbioluminescentphagotrophs Noctiluca)Nakamura1998).Dinoflagellatesaremembersofthealveolates,agroupthat additionallycontainsimportantlaboratorymodelspeciesciliatessuchasparamecium-and Tetrahymena),pathogensofterrestrialandmarineanimalse.g.,theapicomplexanparasites Plasmodium-andToxoplasma,andthemolluscpathogenPerkinsus),ecologicallysignificant heterotrophsciliatesandgregarines).twootherphotosyntheticalveolateshavebeen describedthe chromerids Chromera-andVitrella),bothofwhicharecloselyrelatedtothe apicomplexans,andserveasmodelorganismsforunderstandingtheoriginsofparasitismin thislineagejanouskovec,etal.2010;dorrellandhowe2015). DinoflagellatesarerenownedfortheirunusualanddistinctivecellbiologyLin2011; WisecaverandHackett2011;DorrellandHowe2015).Unlikethoseofallotherstudied eukaryotes,dinoflagellatechromosomesaremaintainedinapermanentlycondensedstate, anddonotprincipallyutilisehistonesfordnapackaginggornik,etal.2012).the mitochondrialgenomesofdinoflagellatesandapicomplexansarehighlyreduced,and severalotherwisewell[conservedsubunitsoftheatpsynthasecomplexhavenotbeen documentedinanydinoflagellatesbutterfield,etal.2013;janouškovec,etal.2013). Oneoftheoddesttraitsofdinoflagellatesistheirpossessionofunusualplastids.Whileall othermajoreukaryoticgroupsareeithernon[photosynthetic,orcontainonlyoneplastid lineage,atleastfourandpotentiallyasmanyassevenphylogeneticallydistinctplastid lineageshavebeendocumentedacrossthedinoflagellatesjanouskovec,etal.2010;dorrell andhowe2015).themajorityofphotosyntheticdinoflagellatespossessplastidsthat containthelightharvestingpigmentperidininhaxo,etal.1976).these"peridininplastids" aremostlikelyofredalgalorigin,althoughtheexactendosymbioticeventsthroughwhich theyoriginatedremaindebatedkeeling2010;ševčíková,etal.2015).phylogeneticstudies haveindicatedthattheperidininplastidwaspresentinacommonancestorof dinoflagellates,chromerids,andapicomplexans,andthatthealternativeplastidsfoundin somedinoflagellateshaveoriginatedthroughserialendosymbiosisjanouskovec,etal. 2010;DorrellandHowe2015). Thegenomeoftheperidininplastidisthesmallestknownfromaphotosyntheticplastid, retainingonlytwelveprotein[codinggenesbarbrook,etal.2014;mungpakdee,etal.2014), plusribosomalrnaandsometransferrnagenesbarbrook,etal.2006;nelson,etal. 2007).Theproteincodinggenessolelyencodecoresubunitsofthephotosyntheticelectron transportmachinery,comprisinggenesencodingsixsubunitsofphotosystemiipsba,-psbb,- psbc,-psbd,-psbe,-psbi),andtwosubunitseachofphotosystemipsaa,-psab),cytochrome b 6 /fpetb,-petd)andplastidatpsynthaseatpa,-atpb).thesegenesarelocatedonsmall elementstermed minicircles of1600[6600bplength)zhang,etal.1999;nelsonand Green2005)and"microcircles"of400[600bplength)Nisbet,etal.2004).Minicircles! 3!

typicallycontainsinglegenes,althoughminicirclesthatcontainnogenesnisbet,etal.2004; Nelson,etal.2007),orcombinationsofmultiplegeneshavealsobeenidentifiedinmultiple dinoflagellatespecieshiller2001;nisbet,etal.2004;nelson,etal.2007).allothergenesof plastidoriginthathavebeendocumentedinperidinindinoflagellatesarelocatedinthe nucleusmorse,etal.1995;bachvaroff,etal.2004;mungpakdee,etal.2014).some minicirclescontainsizeableopenreadingframesofunknownfunctionthatareuniqueto dinoflagellatesbarbrook,etal.2001;nisbet,etal.2004;barbrook,etal.2006).ithasbeen proposedthatsomeperidininplastidscontaingenesencodingribosomalproteinsrpl28,- rpl33)andiron[sulfurclusterbiogenesisfactorsycf16,-ycf24),whichwereacquiredby horizontalgenetransferfromnon[photosyntheticbacteria,althoughwhetherthese sequencesaregenuinelyplastid[encodedremainscontroversialmoszczynski,etal.2012; DorrellandHowe2015). Alongsidetheextremelevelofreductionobservedintheperidininplastidgenome,the transcriptsequencesproducedinperidininplastidsarehighlyunusual.thisisinpartdueto theunusualtranscriptprocessingmachineryassociatedwithperidininplastids,which includesinsomespecies)extensivein[framesequenceeditingzauner,etal.2004;dorrell andhowe2015)andinalldocumentedspecies)theadditionofa3 polyu)tailtoplastid mrnaswangandmorse2006;barbrook,etal.2012),whichcontrastwiththe3'polya)tail and5'spliced[leadersequencesaddedtotranscriptsindinoflagellatenucleizhang,etal. 2007).Inaddition,individualgeneswithinperidininplastidsarehighlydivergentfrom orthologuesfromotherplastidlineagesshalchian[tabrizi,etal.2006;pochon,etal.2014). Genesencodedlocatedinperidininplastidsfrequentlycontainextensivesequence substitutionsbarbrook,etal.2014)andin[frameinsertionsanddeletionsbarbrook,etal. 2006;Barbrook,etal.2014),haveunusualcodonusagepreferencesInagaki,etal.2004; Bachvaroff,etal.2006)andusearangeofalternativetranslationinitiationcodonsin additiontoatgata,att,gta,ttg)zhang,etal.1999;barbrook,etal.2014). Thisprojectwasconceivedtoinvestigatethetimingandextentofthedivergentsequence evolutioninperidininplastids.itisbroadlyagreedthatapplicationofpolyu)tailstoplastid transcriptsisanancestralfeatureofperidinindinoflagellates,andithasbeenproposedthat thefragmentationoftheperidininplastidgenomeintominicircles,andthereductionofthe plastidgenometoaminimalprotein[codinggeneset,arelikewiseancestraljanouskovec,et al.2010;dorrell,etal.2014;dorrellandhowe2015).however,itisnotknownwhenother divergentevolutionaryeventsoccurredinperidininplastids.thishasinpartbeenduetothe lackofavailablesequenceinformation,withessentiallycompleteplastidcodingsequences previouslyavailableonlyforamphidinium-carterae-nisbet,etal.2004),andfortwostrains ofsymbiodinium-cladec3,andclademf)-barbrook,etal.2014;mungpakdee,etal.2014). Inaddition,thephylogeneticrelationshipswithintheperidinindinoflagellatesremainpoorly resolved.whileamphidinium-isagreedtodivergeatthebaseoftheperidinin dinoflagellates,thebranchingorderofotherlineagesremainsdebatedhoppenrathand Leander2010;Bachvaroff,etal.2014;Gavelis,etal.2015).Wewishedtogeneratearobust phylogenyofextantperidinindinoflagellatelineages,andusethisphylogenytoanswer threequestions:1)whichofthedivergentfeaturesassociatedwithperidininplastid sequencesprobablyhaveariseninanancestortoallspeciesstudied;2)towhatextentand inwhichdinoflagellatelineageshavedivergentplastidevolutioneventsoccurredmore recently,and3)whetherthereareanyconsistenttrendsacrossthedinoflagellatesinterms ofwhichplastid[encodedproteins,orregionsofplastid[encodedproteins,arethemost divergent,extendingfromtheircommonancestorthroughtoextantspecies.! 4!

Wepresentataxonomicallydetailedreconstructionoftheevolutionofperidininplastid sequences.wehavefocusedonidentifyingnovelplastidtranscripts,whichallowsusto assessthecompositeeffectsofdivergentgeneevolutionandtranscripteditingonperidinin plastidsequenceszauner,etal.2004;dorrellandhowe2015).wehaveannotatedover 300newperidininplastidsequencesfrompublishedtranscriptomeresources,and demonstratethatthetwelvephotosystemgenespreviouslyidentifiedinperidininplastids Mungpakdee,etal.2014;DorrellandHowe2015)probablyrepresentthecomplete protein[codingcomponentoftheplastidgenomeofacommonancestorofallextant photosyntheticdinoflagellates.wehaveusedthesesequencestogenerateawell[resolved phylogenyofperidininplastids,uncoveringnovelevolutionaryrelationshipsbetweenamong themajordinoflagellatelineages,andhaveusedthisphylogenytodeterminewhendifferent divergentevolutionaryeventsarehaveoccurredinperidininplastids.todisentanglethe differentfactorsthathaveunderpinnedthisunusualsequenceevolution,wehavecalculated substitutionrates,andka/ksratiosalsoreferredtoasdn/ds,anddefinedastherelative enrichmentinnon[synonymoussubstitutions,ka,tosynonymoussubstitutions,ks,overa particularsequence,whichprovidesanindicatorofthestrengthofselectivepressureyang andbielawski2000;hurst2002))foreachdinoflagellatespeciesandeachresidueofeach plastid[encodedproteinstudied. Weshowthattheoriginoftheperidininplastidwasspecificallymarkedbyanelevationin substitutionratesandinka/ksratios,consistentwithchangesinselectionpressureinthe dinoflagellatecommonancestor.weadditionallyshowthattheseandotherdivergent features,suchasin[frameinsertionsandalternativetranslationinitiationcodons,have continuedtoevolveprogressivelyinindividualdinoflagellateclades.finally,weshowthatin boththecommonancestorofallstudiedperidinindinoflagellatesandinextantspecies, elevatedka/ksratiosareconcentratedongenesencodingphotosystemisubunits,and codonsencodingstromal[facingresiduesofplastidproteins.thismaysuggestthatspecific changestodinoflagellatephysiologyhavedriventhedivergentevolutionoftheperidinin plastid.ultimately,ourstudyprovidesvaluableinsightsintotheevolutionaryhistoryofthis unusualplastidlineage,fromitsveryoriginstosubsequentdiversification. Results 1.Annotationofnewperidininplastidsequencesfromtranscriptomedata Identification-of-plastid-sequences Newdinoflagellateorthologuesofthetwelveprotein[codinggenesatpA,-atpB,-petB,-petD,- psaa,-psab,-psba,-psbb,-psbc,-psbd,-psbe-andpsbi)previouslyfoundtoberetainedin peridininplastidshowe,etal.2008;barbrook,etal.2014;mungpakdee,etal.2014)were identifiedinthencbiestlibrary,andtranscriptomelibrarieswithinthemarine MicroeukaryoteTranscriptomeSequencingProjectMMETSP)Keeling,etal.2014).Atotal of381sequenceswereidentified,364ofwhichhadnopreviouslyannotatedequivalents Fig.1;TableS1).Themajority348)ofthenovelsequenceswereidentifiedfrompreviously unannotatedtranscriptswithinthemmetsplibraries,withonlyafewsequences16) identifiablefromestlibrarieslocatedinncbifig.1).sequencesofplausibleplastidorigin wereidentifiedinallbutafewperidinindinoflagellatemmetsplibraries,withaminimumof 15novelorthologuesfoundforeachgenestudiedTableS1),andcompletesetsofprotein[ codingplastidgenesidentifiedfor13newdinoflagellatespeciesinadditiontothethree alreadycharacterisedhowe,etal.2008;barbrook,etal.2014;mungpakdee,etal.2014).! 5!

Novel-sequences-are-of-probable-plastid-origin- UninterruptedpolyT)stretchesof4bporlongerweredetectedonthe3 endofhalf 191/381)ofthenoveldinoflagellatesequences,consistentwiththepresenceofthepolyU) tailsassociatedwithdinoflagellateplastidtranscriptsfig.s1;tables1).acrosstheentire dataset,onlyonesequencewasfoundthatterminatedinapossible3 polya)tail,andnone containedevidenceof5 spliced[leadersequences,orplausibletripartitetargeting sequencesconsistingofasignalpeptide,asafap[delimitedtransitpeptide,anda downstreamhydrophobicregion),asareassociatedwithnucleus[encoded,plastid[targeted proteinsinperidinindinoflagellatesnassouryandmorse2005;zhang,etal.2007)table S1). Limited-additional-coding-sequences-in-peridinin-plastids- Theentireassembledtranscriptomedatasetwasscreenedforfurthertranscriptsthatmight originatefromperidininplastids.wecouldnotfindconvincingevidenceforanyfurther genesthatareplastid[encodedinothernon[dinoflagellatelineages,andmightstillbe plastid[encodedwithinindividualdinoflagellates,beyondthetwelvephotosystemgenes previouslydocumentedfig.s2;supplementaryresults,section1).theentiredatasetwas additionallysearchedforhomologuesofthefourplastidgenesrpl28,-rpl33,-ycf16,-ycf24) proposedtohavebeengainedbyhorizontaltransferfrombacteriaintospecificperidinin plastidsmoszczynski,etal.2012).atotalof148newsequenceswereidentifiedand inspectedusingsingle[genephylogenies.whiletheoriginalpyrocystis-lunula-andceratiumhorridum-sequencesgroupedwithbacteroidetes,noneofthehomologueswithinthis datasetdid:instead,themajorityresolvedasamonophyleticgroupandallgroupedeither within,orassister[groupsto,otherplastidorcyanobacteriallineages,withrobust>90%) bootstrapsupportfig.s3[s6).noneoftherpl28,-rpl33,-ycf16-orycf24-transcriptsequences identifiedinthisstudypossesseda3 polyu)tailtables1);however,manycontained3' polya)tails,spliced[leadersequencestables1),andtripartiteplastidtargetingsequences Fig.S7),consistentwithanuclearorigin. Homologuesofthefournovelopenreadingframespreviouslyidentifiedonminicircles locatedintheamphidinium-carteraeplastidbarbrookandhowe2000;dorrellandhowe 2015)weresearchedforacrosstheentiretranscriptdataset.OnlyequivalentsofA.-carterae- ORF1,ORF2-andORF3-weredetected,andthesewerelimitedtotherelatedspecies Amphidinium-massartiiFig.S8,panelA).TheORF[likesequencesidentifiedinA.-massartiiwerehighlydivergentfromthoseofA.-carterae,withonly-46%ORF1),32%ORF2)and33% ORF3)betweenthetwosequences,comparedtoforexample)97%sequenceconservation betweenthepsbd-sequencesfromeachspeciesfig.s8,panela).wecouldfindonlylimited evidenceforthepresenceofadditionalconservedpolyuridylylatedtranscriptsthatmight correspondtonovelplastidorfswithinthedatasetfig.s8;supplementaryresults,section 1). 2.Reconstructionofphylogeneticrelationshipsbetweenperidinindinoflagellates Aconcatenatedproteinalignment3,410aminoacids,average72.7%pairwiseidentities) wasgenerated,consistingofthetwelveplastidsequencesstudied,foreachofthe dinoflagellatespresentinmmetsp,andareferencesetoffifteennon[dinoflagellatestable S3).BayesianandMaximumLikelihoodtreesweregeneratedfromthisalignmentFig.2; TableS4).TwophylogeneticallydistinctsetsofplastidsequenceswereidentifiedforcladeA Symbiodinium,oneofwhichresolvedwithotherSymbiodinium-species,andtheotherasa! 6!

sistergrouptoa.-carterae,whichpresumablyrepresentsacontaminationwithinthe Symbiodinium-AMMETSPlibraryFig.2;TableS1). Consistentwithpreviousdata,Amphidinium-wasidentifiedastheearliestdivergingperidinin dinoflagellategenusfig.2)dorrellandhowe2015;gavelis,etal.2015).following Amphidinium,-thenextmostbasaldinoflagellateswereTogula-jolla,andacladeconsistingof Prorocentrales,Peridiniales,andthepreviouslySuessialean)speciesPelagodinium-beii, whichdivergedfromacladeconsistingofgonyaulacales,suessiales,andthepreviously Peridinialean)genusHeterocapsa,withmoderatetorobustsupportinBayesiananalysis, >60%inMLtrees)Fig.2).IdenticalFig.S9)ornearlyidenticalFig.S10)topologiestothe originaltreewereobtainedintreescalculatedfromalignmentsfromwhichlongbranches, fastevolvingsites,orindividualplastidgeneshadbeenremovedsupplementaryresults, Section2),suggestingthattheinitialtreetopologywaslargelyaccurate. 3.Dinoflagellate?widechangesinplastidsequencecomposition Changes-in-plastid-GC-content-may-have-occurred-before-the-radiation-of-the-dinoflagellates- Thefirst,secondandthirdpositionGCcontentwerecomparedacrossa4,478ntgap[free alignmentofsixplastidgenespsaa,-psab,-psba,-psbb,-psbc,-psbd),foreachofthe dinoflagellatesstudied,andallofthenon[dinoflagellatesequencespreviouslyusedforthe multigenephylogenyfig.2).elevatedgccontentswereobservedatthirdcodonpositions inmanyofthedinoflagellatescomparedtonon[dinoflagellatespeciesfig.s11).these includedthirdpositiongccontentvaluesof>35%inthreeoftheearliestdiverging dinoflagellatecladesfig.s11;amphidiniumgc[3]=39.6[49.6%,peridinialesgc[3]=29.8[ 40.9%,ProrocentralesGC[3]=31.7[39.5%).However,thechromeridVitrella-brassicaformis, whichformstheclosestsister[grouptothedinoflagellateswithinthemultigenetreefig.2), alsohasahighthirdpositiongccontentfig.s11;46.9%),soitispossiblethatthisgc enrichmentisnotspecifictodinoflagellates. Limited-changes-to-plastid-translation-in-the-dinoflagellate-ancestor-- - Acrosstheentire4,478ntgap[freeplastidalignment,18codonsoccurredwithlower frequencyand19codonswithhigherfrequency)indinoflagellatescomparedtonon[ dinoflagellatespeciesone[wayanova,p<0.05;tables5;fig.s12).tenaminoacidswere likewisefoundtooccurwithlowerfrequency,and5withhigherfrequency,indinoflagellates one[wayanova,p<0.05;tables5;fig.s12)overthisalignment,asinferredbyastandard translationtable.wecouldnotidentifyanyconvincingevidenceforchangestoplastid translationtableswithinthedinoflagellatestables6;fig.s13;supplementaryresults, Section3). Amuchsmallernumberofcodonswerefoundtohaveundergonespecificde[enrichments 6)orenrichments8;chi[squaredtest,P<0.05;TableS5,Fig.S12)inacommonancestorof allstudieddinoflagellatesinferredbycomparingtheregressedancestralsequenceofall dinoflagellatesstudiedtothatoftheregressedancestralsequenceofthecommonancestor ofallstudieddinoflagellatesandvitrella-brassicaformis,whichwastheclosestsister[group todinoflagellatesincludedinthealignment-janouskovec,etal.2010)).comparingthetwo datasets,onlyfourcodonsata[ile,aga[arg,acc[thr,andtat[tyr)werefoundbothto occuratsignificantlyhigherfrequenciesindinoflagellatesthannon[dinoflagellates,andto haveundergoneaspecificenrichmentinthecommonancestorofallstudieddinoflagellates TableS5;Fig.S12).Similarly,onlyonecodonCGT[Arg)wasfoundtooccuratasignificantly! 7!

lowerfrequencyindinoflagellates,andtohaveundergoneaspecificde[enrichmentinthe commonancestorofallstudieddinoflagellatestables5;fig.s12).finally,onlyoneamino acidtyrosine)wasfoundtohaveundergoneasignificantchangeinfrequencyinthe commonancestorofallstudieddinoflagellatestables5;fig.s12),suggestingoverallthat relativelylimitedchangestoplastidcodonusageareassociatedwithdinoflagellateorigins. 4.Dinoflagellate?widechangestoplastidsequenceevolution Elevated-pairwise-substitutions-at-the-origin-of-dinoflagellates- - Totalnumbersofpairwisesubstitutionswerecalculatedforeveryspeciesusedinthe multigenephylogeny,overthe4,478nucleotidegap[freealignmentfig.3;tables7).the dinoflagellatesequenceswerehighlydivergentfromthenon[dinoflagellatespeciesfig.3; comparetoplefthandcorneroffiguretoremainder).onaverage,dinoflagellateandnon[ dinoflagellatespeciespairswereseparatedby1,549nucleotidesubstitutions,whilepairsof non[dinoflagellatespecieswereseparatedbyanaverageof994substitutions,whichwas significantlylowertables7;fig.s14,panela;one[wayanova,p=1.86x10 [179 ).Eventhe relatedalveolatelineagevitrella-brassicaformis,whichwasseparatedfromothernon[ dinoflagellatespeciesbyamuchlargernumberofsubstitutionsaveragevalue1276)was significantlylessdivergentthandinoflagellatespeciesstudiedtables7;fig.s14,panela;p= 1.2x10 [18 ),suggestingthatthiselevatedsubstitutionrateisspecificallyassociatedwith dinoflagellatespecies. Wetestedwhethertheelevatednumbersofsubstitutionsobservedbetweendinoflagellate andnon[dinoflagellatespecieswererelatedeithertoplastidgccontent,codonusage,or aminoacidcompositionfigs.s14,s15;supplementaryresults,section4).whilechangesto plastidgccontentandcodonusagewerecorrelatedtothetotalnumbersofpairwise substitutionsobservedfig.s15),alignmentrecodingtoremovetheseeffectsdidnot eliminatetheelevatedsubstitutionratesassociatedwithdinoflagellatespeciesfig.s14, panelb). Elevated-Ka/Ks-ratios-at-the-origin-of-dinoflagellates- - PairwiseKa/Ksratioswhichprovideinformationregardingthestrengthofselective pressureactingonindividualsequences,asexpressedbytheratioofnon[synonymousto synonymoussubstitutions)wereadditionallycalculatedforeachspeciespairfig.3;table S7).Similartothesituationobservedfortotalsubstitutionrates,muchhigherpairwiseKa/Ks ratioswereobservedbetweendinoflagellateandnon[dinoflagellatespeciespairsaverage value0.0592)thanwithinnon[dinoflagellatespeciespairsfig.s16,panela;averagevalue 0.0183;P=2.75x10 [111 ).AdramaticdifferencewasalsoobservedfortheKa/Ksratios calculatedbetweendinoflagellatesandnon[dinoflagellatespecies,andvitrella-andallother non[dinoflagellatespeciesfig.s16,panela;averagevalue0.0271;p=2.87x10 [18 ). WetestedwhethertheelevatedKa/Ksratiosobservedindinoflagellatesmightberelatedto theextremelyhightotalnumbersofpairwisesubstitutionsobserved,forexampleduetoa saturatingsubstitutionrateatcodonthirdpositionsleadingtoanunderestimateofthetotal synonymoussubstitutionsbetweendinoflagellateandnon[dinoflagellatespeciesfigs.s17[ S18;SupplementaryResults,Section5).Thethirdpositionsubstitutionratesbetween dinoflagellateandnon[dinoflagellatespecies,whilehigh,werenotatsaturationratefigs. S17,S18;SupplementaryResults,Section5).Othervariablestestedwereeithernot correlatedtothepairwiseka/ksratiosobtainedinthecaseofaminoacidcomposition),or! 8!

werenotsufficientjudgedbyalignmentrecoding)toexplainthedifferencesinka/ksratios observedinthecaseofthirdpositiongccontent,andcodonusage;figs.s16,s19; SupplementaryResults,Section6). 5.Lineage?specificchangestoperidininplastidsequences Extremely-elevated-Ka/Ks-ratios-within-dinoflagellates- SomedinoflagellatespecieswerefoundtohaveextremelyelevatedpairwiseKa/Ksratios calculatedrelativetootherdinoflagellatesinthealignmentfig.3).forexample, Pelagodinium-beiiwasseparatedfromallotherdinoflagellatesbyaminimumnumberof 1,400substitutions,andBrandtodinium-nutriculum-wasseparatefromallother dinoflagellatesbyaminimumnumberof1,399substitutionsfig.3;tables7),bothofwhich werefarlargerthantheaverageminimumnumberoftotalpairwisesubstitutions416) calculatedforotherdinoflagellatespeciesztest,p<0.05).theseseparationswerefoundto beindependentofplastidgccontentandcodonusagepatternsinbothspeciesfig.s20; SupplementaryResults,Section8).BothspecieswerealsofoundtohaveelevatedKa/Ks ratiosminimump.-beii-ka/ks0.0618;minimumb.-nutriculumka/ks0.0408;average dinoflagellateminimumka/ks0.0174;fig.3),butthesedifferenceswereeliminatedby removingcodonthirdpositionsfromka/kscalculations,suggestingthattheyaretheresult ofsaturatingmutationratesineachspeciesfig.s20;supplementaryresults,section8). AdramaticevolutionarydivergencewasobservedwithinmembersoftheGonyaulacalesand SuessialesFig.3,bottomrighthandsector;Fig.S21).Specieswithintheselineageshad extremelyhighpairwiseka/ksratios,withamaximumvalueof0.284betweenthe GonyaulacalesProtoceratium-reticulatum-andPyrodinium-bahamense,whichissignificantly greaterthantheaveragemaximumpairwiseka/ksratio0.140)observedforall dinoflagellates-fig.3;tables7;ztest,p<0.05).theaveragepairwiseka/ksratiobetween GonyaulacaleanandSuessialeandinoflagellates0.110)wassignificantlyhigherthanthe averagepairwiseka/ksratiobetweenotherdinoflagellatepairs0.043;fig.s21,panela; P=1.57x10 [09 ).Thiswasfoundtobeindependentofboththirdpositionsubstitutions,and changestocodonusageinthesespeciessupplementaryresults,section8).incontrast,the averagenumberofpairwisesubstitutionsbetweengonyaulacaleanandsuessialeanspecies 711)wassignificantlylowerP=0)thantheaveragepairwisesubstitutionfrequencies observedbetweenotherdinoflagellates1417;fig.3;fig.s21,panelb),indicatingthatthe rapiddivergentevolutionwithinthislineageisspecificallyduetoanelevatedka/ksratio. Widespread-evolution-of-alternative-translation-initiation-codons-in-peridinin-plastids- 44%ofallthesequencesidentifiedlackedaplausibleATGinitiationcodon,henceprobably usealternativetranslationinitiationcodonstables8).eightdifferentcodonswere identifiedasprobablealternativeinitiationsitesforindividualperidininplastidtranscripts, withttgandattoccurringthemostfrequentlyfig.4;tables8).noneofthealternative translationinitiationcodonsidentifiedwasconservedacrossalldinoflagellates,andthe majoritywerespecies[specificfig.4,panela).however,twentyalternativeinitiation codonswereconservedacrossmultipledinoflagellatespeciesfig.4,panelb,squarelabels), suchastheadoptionofattgalternativeinitiationcodonin-psba-inacommonancestorof Alexandrium,-Gambierdiscus,-Pyrodinium-andGonyaulax-sp.Fig.4,panelB,labelI;Fig.S22). Somelineagesutilisealternativeinitiationcodonsmorefrequentlythanothers,withsix alternativeinitiationcodonspetb[gtg,petd[ktg,psab[atw,psbc[atc,psbd[aty,psbe[! 9!

ATT)foundinScrippsiella-hangoei-anditssisterspeciesPeridinium-aciculiferum-Fig.4,panel B,labelsC,D),comparedtoonlyoneatpA[TTG)inAmphidinium-sp.Fig.4,panelB,labelA). Multiple-discrete-changes-to-plastid-protein-sequences-within-dinoflagellates Atotalof111insertionsand48deletionsdistributedacross80positionswereidentified withindinoflagellatesfig.4,panela;tables9).thiscontraststothesituationforthenon[ dinoflagellatespeciesinthealignment,forwhichonly19insertionsand22deletionswere identified-tables9).twelveinsertionsandfivedeletionswereconservedacrossall dinoflagellatesfig.4,panela;fig.s23,panela),althoughmanyoftheseindelshave undergonesubstantialexpansionsorcontractionsinindividualspeciesfig.s23,panelb). Themajorityofindels,however,wererestrictedtoindividualdinoflagellatespecies77 insertions,25deletions)orclades22insertions,18deletions)fig.4,panela;fig.s23, panelc).theseincludedtwoinsertionsinpsab,insertionstartingatconsensusresidue41; andpsbc,residue197)andonedeletioninpsbb,startingatconsensusresidue291)that evolvedinacommonancestorofallspeciesstudied,exceptforthebasallydivergent Amphidinium,andoneinsertionPsaB,residue168)andonedeletionPsaB,residue602)in acommonancestorofgonyaulacalesandsuessialesfig.4,panelb;triangularlabels).- Instancesinwhicharesiduethatisconservedinallotherplastidspeciesstudiedwaslostin dinoflagellatesweretabulatedforonerepresentativeprotein,theatpsynthasesubunit AtpA.200suchchangeswerefound,ofwhichonly15wereancestralFig.4,panelA;Table S10).Oftheremaining185substitutions,93werespecies[specific,while92wereshared acrossspecificdinoflagellateclades.theseclade[specificchangesincludedthelossof thirteenotherwiseconservedresiduesinacommonancestorofallgeneraexcept Amphidinium,-andtwoinacommonancestorofGonyaulacalesandSuessialesFig.4,panel B,circularlabels;TableS10). 6.Identificationofconsistenttrendsinperidininplastidevolution Photosystem-I-sequences-in-peridinin-plastids-have-elevated-Ka/Ks-ratios- - AnelevateddinoflagellateKa/KsratiowasobservedinthetwophotosystemIsubunitgenes psaa-andpsab)retained-indinoflagellateplastids.forpsaa,the-dinoflagellateka/ks0.481) was4.89timesthatofthenon[dinoflagellatevalue0.098);whereasforpsab-the dinoflagellateka/ks0.446)was4.47timesthatofthenon[dinoflagellatevalue0.099;fig. 5,panelA).BoththeKa/Ksratiosobservedweresubstantiallygreaterthanthoseobserved fordinoflagellatesequencesacrosstheentiredataset,bothintermsoftherawka/ksratio 0.296)andintermsoftherelativeenrichment2.41)comparedtothenon[dinoflagellate Ka/Ksratio0.123;Fig.5,panelA).ElevatedKa/Ksvalueswerealsoobtainedfor dinoflagellatepsaa-andpsab-genesinalignmentsrecodedtoeliminatethirdposition substitutionsandcodonusagebiasfig.s24;tabless11,s12[s15;supplementaryresults, Section8). Photosystem-I-sequences-in-the-dinoflagellate-ancestor-also-had-elevated-Ka/Ks-ratios IndividualKa/Ksratioswerealsocalculatedforeachsequence,solelybetweenthe hypotheticalsequencescalculatedforthecommonancestorofallstudieddinoflagellates, andthecommonancestorofdinoflagellatesandvitrella,whichshouldcorrespondtothe substitutionsthatmostprobablyoccurredindinoflagellatesimmediatelyfollowingtheir divergencefromotherplastidlineagestables11).elevatedka/ksratioswerefoundinboth! 10

thedinoflagellateancestorpsaa-ancestralka/ks0.425;dinoflagellateancestor/non[ dinoflagellateratio4.32)andpsab-sequencesancestralka/ks0.593;dinoflagellate ancestor/non[dinoflagellateratio5.95),comparedtoallothergenesancestralka/ks0.376; dinoflagellateancestor/non[dinoflagellateratio3.07;fig.5,panela;tabless11,s12).the Ka/KsratiosassociatedwiththepsaB-dinoflagellateancestorsequencewereconfirmed usingalignmentrecodingtobegenuine,ratherthanaconsequenceofahighlyelevated mutationrateorchangeincodonusagepreferencespecifictodinoflagellatephotosystemi genesfig.s24;supplementaryresults,section8). Elevated-Ka/Ks-ratios-are-specific-to-photosystem-I-genes- - Onlyoneothergene,psbI,wasfoundtohaveagreaterthanaverageKa/Ksratiobetween dinoflagellatesandnon[dinoflagellatesdinoflagellateka/ks0.451;non[dinoflagellateka/ks 0.132;dinoflagellate/non[dinoflagellateratio3.39)andbetweenthedinoflagellateancestor andnon[dinoflagellatesdinoflagellateancestorka/ks0.512;dinoflagellateancestor/non[ dinoflagellateratio3.84;fig.5,panela;tables11).however,theelevatedpsbi-ka/ksratio waseliminatedinalignmentsrecodedtoeliminatethirdpositionsubstitutionsfig.s24; SupplementaryResults,Section8),suggestingthatitistheresultofdueasaturating mutationrateatdinoflagellatepsbi-thirdcodonpositions.noothergenesexceptforpsaaandpsab)werefoundtohaveelevatedassociatedka/ksratios. Stromal-domains-of-peridinin-plastid-proteins-have-elevated-Ka/Ks-ratios- IndividualKa/Ksratioswerecalculatedforeachresiduewithineachplastidgene,for dinoflagellates,non[dinoflagellates,andthecommonancestorofallstudieddinoflagellates TableS11).ThedistributionofresidueswithelevatedKa/Kswasbiasedtowardsspecific regionsofindividualproteins.forexample,withinpsbb,59codonshaveelevatedassociated Ka/Ksratioseitherinthecommonancestorofallstudieddinoflagellatespositionsatthe centreofablockoftenresidueswithaggregateka/ks>1,andka/kssignificantlygreater thanthatcalculatedfortheequivalentnon[dinoflagellateresidue;z[test,p<0.05),orwithin thedinoflagellatessamecriteria,butwithka/ks>0.5;fig.s25).allofthesecodonsare predictedtoencodestromal[orluminal[facingpsbb-residuesfig.s25). Acrosstheentiredataset,171ofthe331residuesidentifiedtohaveelevatedKa/Ksratiosin thecommonancestorofallstudieddinoflagellateswerelocatedonpredictedstromalfaces ofplastidproteinsfig.5,panelb;fig.s26,panela;tabless11,s12).thisissignificantly greaterthanthenumber123)expectedthrougharandomdistributionofresidueschi[ squared,p=6.43x10 [10 ;TableS11).TheelevatedKa/Ksratiosonstromalresidueswerealso identifiedincalculationsperformedwithalignmentsrecodedtoeliminatethirdposition substitutionsandcodonusagebiasfig.s26;supplementaryresults,section9). Thesametrendswerenotdirectlyobservedwithinthedinoflagellates,wherethenumberof stromalresidueswithelevatedka/kswasinfactslightlyfewerthanexpected202/582 residues;expectednumber216;fig.s26,panelb;tabless11,s12).however,thismaybe influencedbythelowka/ksratiosobservedindinoflagellatesforatpa0.251)andatpb 0.254;Fig.5,panelA;TableS11),whicharetheonlytwoproteinsencodedinperidinin dinoflagellateplastidstobeentirelyextrinsictothethylakoidmembrane,henceentirely stromal[facingwalker2013).excludingatpaandatpb,thenumberofresidueswith elevatedka/ksindinoflagellatesthatarepredictedtofaceintotheplastidstromawas extremelygreaterthanexpected172/552residueswithelevatedka/ks;expectednumber ofresidues29;p=0;fig.s26,panelb;tabless11,s12).similartothedinoflagellateancestor,! 11

theenrichmentindinoflagellatestromalka/kswasconfirmedthroughalignmentrecoding tobegenuine,ratherthanaresultofmutationratesaturationorcodonusagebias SupplementaryResults,Section9). Discussion Wehaveidentifiedandanalysedpreviouslyunannotatedsequencesforplastid[encoded transcriptsinperidinindinoflagellates.thesedataconstituteoverthreetimesthenumberof previouslyannotatedperidinindinoflagellateplastidsequences,andincreasesthenumber ofcompleteperidininplastidprotein[codingdatasetsfivefoldfig.1).particularlylarge numbersofplastidtranscriptsequenceswereidentifiedfromthemmetsptranscriptome datasetsfig.1).thismaybeduetothepresenceofthe3'polyu)tailondinoflagellate plastidtranscripts,whichhaspreviouslybeenspeculatedtoenabletheenrichmentofplastid transcriptsequencesinpolya)enrichedrnalibrarieswangandmorse2006)suchasthose usedforgenerationofthemmetsplibrarieskeeling,etal.2014),andiscorroboratedby thepresenceofpolyu)tailsonmanyofthetranscriptsweidentifiedfig.s1;tables1). Almostallofthedinoflagellatesinvestigatedappeartopossessthesametwelveplastid[ encodedprotein[codinggenespreviouslyidentifiedhowe,etal.2008;barbrook,etal. 2014;Mungpakdee,etal.2014).Wefoundnoevidenceforrelocationofanyofthese sequencestothenucleusinanyspeciesfig.s1;tables1)ortheretentionofotherplastid[ derivedsequencesinanyperidininplastidfig.s2;tables2).weadditionallyfoundonly limitedevidenceforthepresenceoflaterallyacquiredgenesintheplastidsofperidinin dinoflagellates,orfortheconservationofothernovelorfspreviouslyidentifiedinindividual peridininplastidlineagesacrossmultiplespeciesfig.s8;tables2).whilewecannot formallyexcludethatotherplastidtranscriptstranscriptsthatdonotreceivea3'polyu) tailhenceareunlikelytobepresentinpolya)[enrichedlibraries)areproduced,ourdata indicatesthatthetwelvepreviouslyidentifiedprotein[codinggenesrepresentstheancestral protein[codingcomponentofdinoflagellateplastids. Thesedatahaveallowedustoproduceawell[resolvedreferencetreeforthebranching relationshipsbetweenmajorcladesofperidinindinoflagellatesfig.2;fig.s9;fig.s10). Severalnovelphylogeneticrelationshipswereuncovered.Forexample,Pelagodinium-beii- previouslygymnodinium-beii)-groupedwiththeprorocentraleswithreasonablesupport 100/64%)whereaspreviousstudiesbasedonsingle[genephylogeniesplaceditwithinthe SuessialesSiano,etal.2010;Decelle,etal.2014).Similarly,sister[grouprelationships betweenprorocentrumandtheperidiniales,andthemonophyleticcladeofgonyaulacales, SuessialesandHeterocapsa-Fig.2),havenottoourknowledgebeenpreviouslydescribed HoppenrathandLeander2010;Bachvaroff,etal.2014;Gavelis,etal.2015).Eachofthese novelrelationshipswerealsorecoveredusingmodifiedalignmentsfromwhichlong branches,individualgenes,gappedpositions,orfast[evolvingsiteswereremovedfig.s9; Fig.S10).Weaccordinglyconcludethattherelationshipsobtainedwithinourdatasetare probablygenuine,andnottheartifactoffastsequenceevolutionorrecentdiscretechanges toplastidtranslationtables,whichmightbiastheconceptualtranslationsobtained,but presumablyshouldberemovedinthefastsiteanalysis)inindividualdinoflagellateplastids. Wehavecorrelateddivergentchangestoperidininplastidtranscriptsequencestothe branchingrelationshipsfromthemultigenephylogeny,allowingustoinferwhenthese changesoccurred.fortheseanalyses,wehavefocusedexclusivelyonthesequencesand conceptualtranslationsofplastidtranscripts,whichprovideanunderstandingofthe aggregateconsquencesofdivergentgeneevolutionandtranscripteditingonperidinin! 12

plastidszauner,etal.2004;bachvaroff,etal.2006;dorrellandhowe2015).asmanyofthe sequencesidentifiedfrommmetsppossesspolyu)tailsfig.s1;table1),andthepresence ofthepolyu)tailisassociatedwiththecompletionoftranscripteditingindinoflagellate speciesdangandgreen2009;dorrell,etal.2016),wepresumethatthemajorityofthe sequencesidentifiedinthisstudyprobablyhavebeeneditedtocompletion,asopposedto representinguneditedprecursortranscripts. First,wehaveidentifiedchangestoperidininplastidsequencecompositionacrossthe dinoflagellates.theseincludechangestoplastidsequencegccontentandcodonusagefig. S11;Fig.S12),althoughmanyofthesechangesareeithersharedwithrelativessuchasthe elevatedthirdpositiongccontentinthechromeridvitrella;fig.s11),orappearlargelyto consistofchangesspecifictoindividualdinoflagellatelineagessuchasthechangesto plastidcodonusagefrequencies,whicharemuchmorenumerousinextantdinoflagellate speciesthanintheinferredsequencesoftheirlastcommonancestor;fig.s12).wecould notfindconvincingevidenceforchangestotheplastidtranslationtableacrossthe dinoflagellatesfig.s13).whilewecannotexcludealternativehypothesesforexample, independentincreasesinthirdpositiongccontentinthelineagesgivingrisetovitrella,and inacommonancestorofalldinoflagellates),wecannotfindsufficientevidenceformajor changestoplastidsequencecomposition,ortranslation,occurringinthedinoflagellate commonancestor. Next,wehaveidentifiedwidespreadchangestothetranslationproductsofplastid sequencesindinoflagellates.theseincludelargenumbersofsynonymoussubstitutions,and greatlyelevatedka/ksratiosinthedinoflagellatesfig.3;fig.s14;fig.s16)).theelevated Ka/KsratiossubstitutionsareunlikelytobeexplainedbychangesincodonpreferenceFig. S11;Fig.S16),orsaturationofsynonymoussubstitutionratesatthird[positionsitesin dinoflagellateplastidsfig.s17;fig.s18;fig.s19).itispossiblethatotherfactorsrelatedto sequencecompositione.g.,transitorychangesincodonpreferenceinindividual dinoflagellatelineages,orsaturationofsynonymoussubstitutionratesatfirst[andsecond[ positionsites)mayhavecontributedtotheelevatedka/ksratiosobserved;however,we suggestthatthemostparsimoniousexplanationfortheremainingelevatedka/ksratios observedindinoflagellateplastidsarechangesinplastidselectionpressurethroughouttheir evolution,followingtheirdivergencefromotherplastidlineages,andpriortotheradiation ofextantspecies.althoughpreviousstudieshavepositedchangesinselectionpressureon certainperidininplastidsequencesshalchian[tabrizi,etal.2006),thisistoourknowledge thefirstevidencethatindicatesthatselectiveeventshaveplayedawidespreadrolein dinoflagellateplastidevolution. Wehaveadditionallyidentifiedchangesthathaveoccurredinindividualdinoflagellate lineagessincetheirradiationfig.3;fig.4;figs.s20[s23).wehavefoundextremely elevatedka/ksratiosinpairwisecomparisonsbetweendinoflagellatesfig.3;figs.s20; S21),andmapmultipleacquisitionsofalternativetranslationinitiationcodonsFig.S22),in[ frameinsertionanddeletionsfig.s23),andthelossofotherwiseconservedatpa-residues tothedinoflagellatetreefig.4).theseeventshaveoccurredprogressively,withthe majorityofthedinoflagellatecladesbeingmarkedbydiscretechangestoplastidsequence Fig.4),andcontrastswiththemuchmoreconservativeevolutionobservedinotherplastid lineagestabless8,s9).thedivergentevolutionaryeventsobservedinthisstudyawait detailedbiochemicalcharacterisation.forexample,itwillbeinterestingtodetermine experimentallywhetherperidinindinoflagellatesutilisethealternativetranslationinitiation codonsidentifiedinthisstudy,orwhethertherearefurthertranslationinitiationcodons withweakersimilaritytoatginperidininplastids.thiscouldbeaccomplished,forexample,! 13

byproteomiccharacterisationofthen[terminiofperidininplastidproteinshuesgen,etal. 2013).Regardless,ourdatashowthatperidininplastidsequenceshavenotremainedstatic, buthavecontinuedtodivergefromoneanothertoaremarkableextent. Wehavefoundevidencethatdifferentperidininplastidlineageshaveevolvedindifferent mannerssincetheirradiation.forexample,weobserveelevatedminimumpairwise substitutionratesinsomeperidinialean/prorocentraleanspeciese.g.,brandtodinium,- Pelagodinium;Fig.3),whichiscorroboratedbytheratherlongbranchlengthsassociated withthesespeciesinthemultigenetreefig.2;fig.s9),andappearstoexplainatleast partiallyexplainthehighka/ksratiosobservedforthesespeciesfig.s20).wealsoobserve extremelyhighmaximumpairwiseka/ksratioswithinmembersofthegonyaulacalesand Suessiales,butincontrasttheseoccuralongsiderelativelylowpairwisesubstitutionrates, suggestingthattheyhaveresultedfromachangeinplastidselectivepressureinthese lineagesfig.3;fig.s21).similarly,wehavefoundthatspecificnodesontheperidinin plastidtreearemarkedbytheoriginsoflargenumbersofalternativeinitiationcodonsthe commonancestorofscrippsiella-hangoei-andperidinium-aciculiferum;fig.4),indelsthe commonancestorofamphidinium-andheterocapsa;fig.4),orchangestoconservedatparesiduesthecommonancestorofsymbiodinium;fig.4).itremainstobedeterminedwhy theplastidsofspecificdinoflagellatelineagesandnotothersareunusual,althoughwenote thatmanyofthemostdivergentspecieswithinourdatasethavesymbioticlifestrategies forexample,pelagodinium-isanendobiontofforaminiferanssiano,etal.2010),whereas BrandtodiniumisaradiolariansymbiontProbert,etal.2014)).Moretaxonomicallydetailed comparisonsofplastidevolutioninendosymbioticdinoflagellatestotheirfree[living relativesmayprovideinsightsintowhethersymbiosishasdrivendivergentplastidevolution withinthedinoflagellates. Moreglobally,itremainstoberesolvedwhytheperidinindinoflagellateshaveundergone suchdivergentplastidevolutioncomparedtootherlineages.notably,weshowthatresidues withelevatedka/ksratiosinperidininplastidsareconcentrated,bothinextant dinoflagellatesandinthedinoflagellatecommonancestor,onphotosystemisubunits,and stromalregionsofplastidproteins,whichcannotbeexplainedbychangestocodonusageor thirdpositionsubstitutionratefig.5;fig.s24;fig.s25;fig.s26).whiledivergentevolution haspreviouslybeenreportedinthephotosystemisequencesinindividualperidininplastid lineagesbachvaroff,etal.2006;shalchian[tabrizi,etal.2006;pochon,etal.2014),thisisto ourknowledgethefirstevidencethatthisbiasinevolutionaryeventsisconserved throughoutperidininplastids,fromtheoriginofdinoflagellatestoextantspecies. Similar,albeitlessextremedivergentevolutioneventstothosedescribedinthisstudyhave beenobservedinnucleargenesencodingplastid[targetedproteinsbachvaroff,etal.2006; Mungpakdee,etal.2014)andnon[plastidproteinsKim,etal.2011)inperidinin dinoflagellates.itwillbeinterestingtodetermineifthedivergentevolutioneventsobserved innucleus[encodedandplastid[encodedgenesarelinked;forexampleifthenucleus[ encodedcomponentsofdinoflagellatephotosystemi,ornucleus[encodedproteinslikelyto interactwiththestromalfacesofplastid[encodeddinoflagellateproteins,likewisepossess specificallyelevatedka/ksratios.conservedtrendsintheevolutionofnucleusandplastid[ encodedproteinsindinoflagellatesmightariseasaresultofcompensatoryevolutionevents tomaintainplastidphysiologyforexample,maintainingplastidredoxstateforplastid physiologyandgeneregulationallen1993;puthiyaveetil,etal.2008),orbalancingcyclic andlinearelectronflowtomeetthephysiologicalrequirementsofindividuallineages Reynolds,etal.2008),orasaresultofdivergentselectiononplastidproteinsto accommodatesomeofthemoreunusualnucleus[encodedproteinsandstructurespresent! 14

inperidininplastidsforexample,thedinoflagellatepyrenoidnassoury,etal.2005;siano, etal.2010),ortheperidininpigment[bindingproteincomplexeshaxo,etal.1976)). Ultimately,understandingtherelationshipsbetweenperidininplastidsequences,evolution andphysiologymayprovidevaluableinsightsintothebiologyofthisunusualand ecologicallyimportantlineage. MaterialsandMethods Assemblyandannotationofpreviouslyknownperidininplastidsequences Nucleotidesequencescorrespondingtothetwelveprotein[codinggenesatpA,-atpB,-petB,- petd,-psaa,-psab,-psba,-psbb,-psbc,-psbd,-psbe,-psbi)thatarefoundonthethreeeffectively completeperidininplastidgenomesamphidinium-carterae-ccap1102/6;symbiodinium-sp., CladeC3,andCladeMf)Nisbet,etal.2004;Barbrook,etal.2014;Mungpakdee,etal. 2014),wereassembledfromperidinindinoflagellateESTlibrarieswithinNCBIandthe MarineMicroeukaryoteTranscriptomeSequencingProjectMMETSP)Keeling,etal.2014). SequenceswereidentifiedbytBLASTnsearches,usingannotatedplastidproteinsequences fromperidinindinoflagellatesandtheircloserelativeslistedintables1).wherepossible, thepredictedtranslationproductsofplastidtranscriptsequenceswereused,insteadofthe translationproductsofplastidgdnasequences,duetothepresenceofplastidtranscript editinginsomedinoflagellateszauner,etal.2004;dorrellandhowe2015). Sequencesthatmatchedthequerieswithexpectvaluesofbelow1x10 [5 wereselected,and weresearchedusingblastxagainsttheentirencbidatabase.sequencesthatyieldedatop hitagainstanotherperidinindinoflagellatesequencewereretainedforsubsequentanalysis. Sequencesthatwereexcludedonthebasisofbeingofprobablenon[dinoflagellateorigin arelistedintables1.inthecaseofpsbi,forwhichperidinindinoflagellatesequencesare knowntobehighlydivergentnisbet,etal.2004),aninitialexpectvaluethreshold<1.0was usedtoidentifypossibledinoflagellateorthologues,andtheseorthologueswereusedin turnasquerysequencesforasecondroundofreciprocaltblastn/blastxsearchesagainst eachdataset,toidentifyeventhemostdivergentsequencespresent. ToinvestigatewhethernovelplastidORFs,andgenesproposedtohavebeenacquiredby lateraltransferfrombacteroidetesareconservedacrossperidininplastids,similarblast searcheswereperformedwithtranslationproductsofthefourpredictedorfsorf1,-orf2,- ORF3,-andORF4)describedintheAmphidinium-carterae-plastidgenomeBarbrook,etal. 2001;Nisbet,etal.2004;Barbrook,etal.2006),andtheproposedplastidrpl28-andrpl23 genesfrompyrocystis-lunula-andycf24-andycf16-sequencesfromceratium-horridum- Moszczynski,etal.2012).Forthesegenes,sequenceswereretainedifthetopreciprocal BLASThitwashomologoustothegenesinquestion,regardlessofevolutionaryaffinity.- SequencesthatpassedthereciprocalBLASTxsearchwereassembledintocontigsusing GeneIOUSv4.76Kearse,etal.2012).Eachnucleotidesequencewassearchedmanuallyfor possible5'splicedleadersequenceszhang,etal.2007),anduninterrupted3 polyt)and polya)tractsofmorethan3bplength,whichmightrespectivelycorrespondtotranscript polyu)andpolya)tailswangandmorse2006).thepredictedtranslationproductsofeach contigwereinspectedforthepresenceofpossibleplastid[targetingsequencesusingsignalp version3.0bendtsen,etal.2004)andasafindgruber,etal.2015),andformitochondria[ targetingsequencesusingtargetpversion1.1emanuelsson,etal.2007).annotatedcopies ofeachnovelperidininplastidtranscriptincludinginformationonthepresenceofpossible! 15

polyu)tails,polya)tails,spliced[leadersequences,andplastid targetingsequences)are includedintables1. Globalidentificationandannotationofperidininplastidtranscripts Todeterminewhetherfurther,previouslyundocumentedORFsareconservedacross peridininplastids,anindependent,top[downsearchwasperformedfortheentire dinoflagellatesequencelibrary.wefocusedontranscriptspossessinga3'polyu)tail,asthis featureisbelievedtobeuniquelyassociatedwithplastidtranscriptsindinoflagellates DorrellandHowe2012),andpolyuridylylatedtranscriptshavepreviouslybeenindicatedto beenrichedbypolya)selection,hencemaybepresentinpolya)[selectedtranscriptome datasetssuchasmmetspwangandmorse2006;keeling,etal.2014). Forthisanalysis,allsequencesthatterminatedateitherendinapotentialpolyT)sequence >3bp)fromallRNAlibrarieswereextracted,andfilteredtoremovesequencesthateither containedamoreplausiblereversecomplementofapolya)sequencelengthofpolya) sequence polyt)sequence),oraspliced[leader 6bpspliced[leadersequence)onthe othertranscriptend.parallellibrarieswereconstructed,containingalldinoflagellate sequencesthatbythesamecriteriapossessedeitherapolya)tail,orspliced[leader sequences,andtheremainingpolyt)[containingsequencesweresearchedagainstthese librariesusingtblastx.polyt)sequencesthatwerecontiguouswithotherpolya)or spliced[leadercontainingsequencesasjudgedbyblastbesthit;expectvalue<1x10 [5,> 90%similarities)werelikewiseremovedfromthedataset. Next,toidentifysequencesencodingproteinsofprobableplastidfunction,thefiltered polyt)[containingsequenceswereorientedsuchthatthepolyt)sequencewaslocatedon thetranscript3'end,andsearchedbyblastxagainstacompositeproteindataset, consistingofthecompleteproteinsequencesencodedinthenuclearandplastidgenomesof themodeldiatomphaeodactylum-tricornutumoudot[lesecq,etal.2007;bowler,etal. 2008),andalldinoflagellateplastidsequencespreviouslyidentifiedbytheBLASTsearches detailedabove.sequencesthatyieldedatophitinareverseorientationreadingframei.e. suchthatthepolyt)sequencecouldnotbelocatedonatranscript3'end),orwere identifiedtocorrespondtoaninternalregionofatranscriptjudgedifthe3'endofthe transcriptalignmentcorrespondedto<90%thefulllengthofthesubjectlength),that yieldedatophitagainstaknowndinoflagellateplastidprotein,orthatyieldedatophit againstaphaeodactylum-nucleus[encodedproteinexpectvalue<1x10 [5 )wereremoved. ThelongestORFsfromsequencesthatwerefoundtopossessplausible3'ends,andwere homologoustoplastid[encodedproteinsinphaeodactylum-thathavenotpreviouslybeen identifiedinperidinindinoflagellateplastidswereextracted,andsearchedforthepresence ofplastid[andmitochondria[targetingsequencesasabove.thesequencesforwhichno plausibleblasthitwasfoundandthusmaycontainnovelplastidorfs),weresearched againstoneanotherusingreciprocaltblastx/tblastxsearches.sequencesthatwerefound tomatchoneanotherwithareciprocalblasthitexpectvalueof<1x10 [5 andthusmight correspondtoconservednovelplastidproteins)wereretained,assembledintocontigs,and alignedusinggeneiousv4.76,asabove.thefinalannotationsforeachpolyt)[containing sequence,aswellasfullnucleotideandproteinsequencesforthepossiblenovelplastid ORFsidentified,arepresentedinTableS2. Phylogeneticanalysis! 16