University of Dundee Progressive and biased divergent evolution underpins the origin and diversification of peridinin dinoflagellate plastids Dorrell, Richard G.; Klinger, Christen M.; Newby, Robert J.; Butterfield, Erin; Richardson, Elisabeth; Dacks, Joel B.; Howe, Christopher J.; Nisbet, R. Ellen R.; Bowler, Chris Published in: Molecular Biology and Evolution DOI: 10.1093/molbev/msw235 Publication date: 2016 Document Version Accepted author manuscript Link to publication in Discovery Research Portal Citation for published version APA): Dorrell, R. G., Klinger, C. M., Newby, R. J., Butterfield, E. R., Richardson, E., Dacks, J. B.,... Bowler, C. 2016). Progressive and biased divergent evolution underpins the origin and diversification of peridinin dinoflagellate plastids. Molecular Biology and Evolution, 342), 361-379. DOI: 10.1093/molbev/msw235 General rights Copyright and moral rights for the publications made accessible in Discovery Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from Discovery Research Portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain. You may freely distribute the URL identifying the publication in the public portal. Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
MBE Advance Access published November 4, 2016 Articles:Discoveries Progressiveandbiaseddivergentevolutionunderpinstheoriginanddiversificationof peridinindinoflagellateplastids RichardG.Dorrell 1+,ChristenM.Klinger 2,RobertJ.Newby 3,ErinR.Butterfield 4,5,Elisabeth Richardson 2,JoelB.Dacks 2,ChristopherJ.Howe 6,R.EllenR.Nisbet 6,andChrisBowler 1 1EcoleNormaleSupérieure,PSLResearchUniversity,InstitutdeBiologiedel EcoleNormale SupérieureIBENS),CNRSUMR8197,INSERMU1024,46rued Ulm,F[75005Paris,France 2DepartmentofCellBiology,UniversityofAlberta 3DepartmentofBiology,MiddleTennesseeStateUniversity 4DepartmentofBiochemistry,PennsylvaniaStateUniversity 5SchoolofLifeSciences,UniversityofDundee 6DepartmentofBiochemistry,UniversityofCambridge Contributedequallytothismanuscript +Towhomcorrespondenceshouldbeaddressed:dorrell@biologie.ens.fr any medium, provided the original work is properly cited. For commercial re-use, please contact The Authors) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in journals.permissions@oup.com Downloaded from http://mbe.oxfordjournals.org/ at University of Dundee on November 11, 2016! 1!
Abstract Dinoflagellatesarealgaeoftremendousimportancetoecosystemsandtopublichealth. Thecellbiologyandgenomeorganisationofdinoflagellatespeciesishighlyunusual.For example,theplastidgenomesofperidinin?containingdinoflagellatesencodeonlya minimalnumberofgenesarrangedonsmallelementstermed minicircles.previous studiesofperidininplastidgeneshavefoundevidencefordivergentsequenceevolution, includingextensivesubstitutions,novelinsertionsanddeletions,anduseofalternative translationinitiationcodons.understandingtheextentofthisdivergentevolutionhas beenhamperedbythelackofcharacterisedperidininplastidsequences.wehave identifiedover300previouslyunannotatedperidininplastidmrnasfrompublished transcriptomeprojects,vastlyincreasingthenumberofsequencesavailable.usingthese data,wehaveproducedawell?resolvedphylogenyofperidininplastidlineages,which uncoversseveralnovelrelationshipswithinthedinoflagellates.thisenablesustodefine changestoplastidsequencesthatoccurredearlyindinoflagellateevolution,andthathave contributedtothesubsequentdiversificationofindividualdinoflagellateclades.wefind thattheoriginoftheperidinindinoflagellateswasspecificallyaccompaniedbyelevations bothintheoverallnumberofsubstitutionsthatoccurredonplastidsequences,andinthe Ka/Ksratioassociatedwithplastidsequences,consistentwithchangesinselective pressure.thesesubstitutions,alongsideotherchanges,haveaccumulatedprogressively inindividualperidininplastidlineages.throughoutourentiredataset,weidentifya persistentbiastowardsnon?synonymoussubstitutionsoccurringonsequencesencoding photosystemisubunitsandstromalregionsofperidininplastidproteins,whichmayhave underpinnedtheevolutionofthisunusualorganelle.! 2!
Introduction Dinoflagellatesareeukaryoteswithimmenseecologicalandevolutionaryimportance.They includephotosynthetic,heterotrophic,mixotrophic,andparasiticrepresentativesdorrell andhowe2015).thephotosyntheticspeciesareamajorcomponentofplankton communitiesinmarineandfreshwaterenvironmentsleterme,etal.;devargas,etal. 2015),andincludecausativeagentsofharmfulalgalbloomsProtoperidinium,-Ceratium) Hallegraeff2010;Hinder,etal.2012),symbiontsofcoralsSymbiodinium)andmarine protozoapelagodinium,-brandtodinium)siano,etal.2010;probert,etal.2014).thenon[ photosyntheticspeciesincludeparasitesofmarineinvertebrateshematodinium,- Syndiniales,andellobiopsids)Gornik,etal.2012),andbioluminescentphagotrophs Noctiluca)Nakamura1998).Dinoflagellatesaremembersofthealveolates,agroupthat additionallycontainsimportantlaboratorymodelspeciesciliatessuchasparamecium-and Tetrahymena),pathogensofterrestrialandmarineanimalse.g.,theapicomplexanparasites Plasmodium-andToxoplasma,andthemolluscpathogenPerkinsus),ecologicallysignificant heterotrophsciliatesandgregarines).twootherphotosyntheticalveolateshavebeen describedthe chromerids Chromera-andVitrella),bothofwhicharecloselyrelatedtothe apicomplexans,andserveasmodelorganismsforunderstandingtheoriginsofparasitismin thislineagejanouskovec,etal.2010;dorrellandhowe2015). DinoflagellatesarerenownedfortheirunusualanddistinctivecellbiologyLin2011; WisecaverandHackett2011;DorrellandHowe2015).Unlikethoseofallotherstudied eukaryotes,dinoflagellatechromosomesaremaintainedinapermanentlycondensedstate, anddonotprincipallyutilisehistonesfordnapackaginggornik,etal.2012).the mitochondrialgenomesofdinoflagellatesandapicomplexansarehighlyreduced,and severalotherwisewell[conservedsubunitsoftheatpsynthasecomplexhavenotbeen documentedinanydinoflagellatesbutterfield,etal.2013;janouškovec,etal.2013). Oneoftheoddesttraitsofdinoflagellatesistheirpossessionofunusualplastids.Whileall othermajoreukaryoticgroupsareeithernon[photosynthetic,orcontainonlyoneplastid lineage,atleastfourandpotentiallyasmanyassevenphylogeneticallydistinctplastid lineageshavebeendocumentedacrossthedinoflagellatesjanouskovec,etal.2010;dorrell andhowe2015).themajorityofphotosyntheticdinoflagellatespossessplastidsthat containthelightharvestingpigmentperidininhaxo,etal.1976).these"peridininplastids" aremostlikelyofredalgalorigin,althoughtheexactendosymbioticeventsthroughwhich theyoriginatedremaindebatedkeeling2010;ševčíková,etal.2015).phylogeneticstudies haveindicatedthattheperidininplastidwaspresentinacommonancestorof dinoflagellates,chromerids,andapicomplexans,andthatthealternativeplastidsfoundin somedinoflagellateshaveoriginatedthroughserialendosymbiosisjanouskovec,etal. 2010;DorrellandHowe2015). Thegenomeoftheperidininplastidisthesmallestknownfromaphotosyntheticplastid, retainingonlytwelveprotein[codinggenesbarbrook,etal.2014;mungpakdee,etal.2014), plusribosomalrnaandsometransferrnagenesbarbrook,etal.2006;nelson,etal. 2007).Theproteincodinggenessolelyencodecoresubunitsofthephotosyntheticelectron transportmachinery,comprisinggenesencodingsixsubunitsofphotosystemiipsba,-psbb,- psbc,-psbd,-psbe,-psbi),andtwosubunitseachofphotosystemipsaa,-psab),cytochrome b 6 /fpetb,-petd)andplastidatpsynthaseatpa,-atpb).thesegenesarelocatedonsmall elementstermed minicircles of1600[6600bplength)zhang,etal.1999;nelsonand Green2005)and"microcircles"of400[600bplength)Nisbet,etal.2004).Minicircles! 3!
typicallycontainsinglegenes,althoughminicirclesthatcontainnogenesnisbet,etal.2004; Nelson,etal.2007),orcombinationsofmultiplegeneshavealsobeenidentifiedinmultiple dinoflagellatespecieshiller2001;nisbet,etal.2004;nelson,etal.2007).allothergenesof plastidoriginthathavebeendocumentedinperidinindinoflagellatesarelocatedinthe nucleusmorse,etal.1995;bachvaroff,etal.2004;mungpakdee,etal.2014).some minicirclescontainsizeableopenreadingframesofunknownfunctionthatareuniqueto dinoflagellatesbarbrook,etal.2001;nisbet,etal.2004;barbrook,etal.2006).ithasbeen proposedthatsomeperidininplastidscontaingenesencodingribosomalproteinsrpl28,- rpl33)andiron[sulfurclusterbiogenesisfactorsycf16,-ycf24),whichwereacquiredby horizontalgenetransferfromnon[photosyntheticbacteria,althoughwhetherthese sequencesaregenuinelyplastid[encodedremainscontroversialmoszczynski,etal.2012; DorrellandHowe2015). Alongsidetheextremelevelofreductionobservedintheperidininplastidgenome,the transcriptsequencesproducedinperidininplastidsarehighlyunusual.thisisinpartdueto theunusualtranscriptprocessingmachineryassociatedwithperidininplastids,which includesinsomespecies)extensivein[framesequenceeditingzauner,etal.2004;dorrell andhowe2015)andinalldocumentedspecies)theadditionofa3 polyu)tailtoplastid mrnaswangandmorse2006;barbrook,etal.2012),whichcontrastwiththe3'polya)tail and5'spliced[leadersequencesaddedtotranscriptsindinoflagellatenucleizhang,etal. 2007).Inaddition,individualgeneswithinperidininplastidsarehighlydivergentfrom orthologuesfromotherplastidlineagesshalchian[tabrizi,etal.2006;pochon,etal.2014). Genesencodedlocatedinperidininplastidsfrequentlycontainextensivesequence substitutionsbarbrook,etal.2014)andin[frameinsertionsanddeletionsbarbrook,etal. 2006;Barbrook,etal.2014),haveunusualcodonusagepreferencesInagaki,etal.2004; Bachvaroff,etal.2006)andusearangeofalternativetranslationinitiationcodonsin additiontoatgata,att,gta,ttg)zhang,etal.1999;barbrook,etal.2014). Thisprojectwasconceivedtoinvestigatethetimingandextentofthedivergentsequence evolutioninperidininplastids.itisbroadlyagreedthatapplicationofpolyu)tailstoplastid transcriptsisanancestralfeatureofperidinindinoflagellates,andithasbeenproposedthat thefragmentationoftheperidininplastidgenomeintominicircles,andthereductionofthe plastidgenometoaminimalprotein[codinggeneset,arelikewiseancestraljanouskovec,et al.2010;dorrell,etal.2014;dorrellandhowe2015).however,itisnotknownwhenother divergentevolutionaryeventsoccurredinperidininplastids.thishasinpartbeenduetothe lackofavailablesequenceinformation,withessentiallycompleteplastidcodingsequences previouslyavailableonlyforamphidinium-carterae-nisbet,etal.2004),andfortwostrains ofsymbiodinium-cladec3,andclademf)-barbrook,etal.2014;mungpakdee,etal.2014). Inaddition,thephylogeneticrelationshipswithintheperidinindinoflagellatesremainpoorly resolved.whileamphidinium-isagreedtodivergeatthebaseoftheperidinin dinoflagellates,thebranchingorderofotherlineagesremainsdebatedhoppenrathand Leander2010;Bachvaroff,etal.2014;Gavelis,etal.2015).Wewishedtogeneratearobust phylogenyofextantperidinindinoflagellatelineages,andusethisphylogenytoanswer threequestions:1)whichofthedivergentfeaturesassociatedwithperidininplastid sequencesprobablyhaveariseninanancestortoallspeciesstudied;2)towhatextentand inwhichdinoflagellatelineageshavedivergentplastidevolutioneventsoccurredmore recently,and3)whetherthereareanyconsistenttrendsacrossthedinoflagellatesinterms ofwhichplastid[encodedproteins,orregionsofplastid[encodedproteins,arethemost divergent,extendingfromtheircommonancestorthroughtoextantspecies.! 4!
Wepresentataxonomicallydetailedreconstructionoftheevolutionofperidininplastid sequences.wehavefocusedonidentifyingnovelplastidtranscripts,whichallowsusto assessthecompositeeffectsofdivergentgeneevolutionandtranscripteditingonperidinin plastidsequenceszauner,etal.2004;dorrellandhowe2015).wehaveannotatedover 300newperidininplastidsequencesfrompublishedtranscriptomeresources,and demonstratethatthetwelvephotosystemgenespreviouslyidentifiedinperidininplastids Mungpakdee,etal.2014;DorrellandHowe2015)probablyrepresentthecomplete protein[codingcomponentoftheplastidgenomeofacommonancestorofallextant photosyntheticdinoflagellates.wehaveusedthesesequencestogenerateawell[resolved phylogenyofperidininplastids,uncoveringnovelevolutionaryrelationshipsbetweenamong themajordinoflagellatelineages,andhaveusedthisphylogenytodeterminewhendifferent divergentevolutionaryeventsarehaveoccurredinperidininplastids.todisentanglethe differentfactorsthathaveunderpinnedthisunusualsequenceevolution,wehavecalculated substitutionrates,andka/ksratiosalsoreferredtoasdn/ds,anddefinedastherelative enrichmentinnon[synonymoussubstitutions,ka,tosynonymoussubstitutions,ks,overa particularsequence,whichprovidesanindicatorofthestrengthofselectivepressureyang andbielawski2000;hurst2002))foreachdinoflagellatespeciesandeachresidueofeach plastid[encodedproteinstudied. Weshowthattheoriginoftheperidininplastidwasspecificallymarkedbyanelevationin substitutionratesandinka/ksratios,consistentwithchangesinselectionpressureinthe dinoflagellatecommonancestor.weadditionallyshowthattheseandotherdivergent features,suchasin[frameinsertionsandalternativetranslationinitiationcodons,have continuedtoevolveprogressivelyinindividualdinoflagellateclades.finally,weshowthatin boththecommonancestorofallstudiedperidinindinoflagellatesandinextantspecies, elevatedka/ksratiosareconcentratedongenesencodingphotosystemisubunits,and codonsencodingstromal[facingresiduesofplastidproteins.thismaysuggestthatspecific changestodinoflagellatephysiologyhavedriventhedivergentevolutionoftheperidinin plastid.ultimately,ourstudyprovidesvaluableinsightsintotheevolutionaryhistoryofthis unusualplastidlineage,fromitsveryoriginstosubsequentdiversification. Results 1.Annotationofnewperidininplastidsequencesfromtranscriptomedata Identification-of-plastid-sequences Newdinoflagellateorthologuesofthetwelveprotein[codinggenesatpA,-atpB,-petB,-petD,- psaa,-psab,-psba,-psbb,-psbc,-psbd,-psbe-andpsbi)previouslyfoundtoberetainedin peridininplastidshowe,etal.2008;barbrook,etal.2014;mungpakdee,etal.2014)were identifiedinthencbiestlibrary,andtranscriptomelibrarieswithinthemarine MicroeukaryoteTranscriptomeSequencingProjectMMETSP)Keeling,etal.2014).Atotal of381sequenceswereidentified,364ofwhichhadnopreviouslyannotatedequivalents Fig.1;TableS1).Themajority348)ofthenovelsequenceswereidentifiedfrompreviously unannotatedtranscriptswithinthemmetsplibraries,withonlyafewsequences16) identifiablefromestlibrarieslocatedinncbifig.1).sequencesofplausibleplastidorigin wereidentifiedinallbutafewperidinindinoflagellatemmetsplibraries,withaminimumof 15novelorthologuesfoundforeachgenestudiedTableS1),andcompletesetsofprotein[ codingplastidgenesidentifiedfor13newdinoflagellatespeciesinadditiontothethree alreadycharacterisedhowe,etal.2008;barbrook,etal.2014;mungpakdee,etal.2014).! 5!
Novel-sequences-are-of-probable-plastid-origin- UninterruptedpolyT)stretchesof4bporlongerweredetectedonthe3 endofhalf 191/381)ofthenoveldinoflagellatesequences,consistentwiththepresenceofthepolyU) tailsassociatedwithdinoflagellateplastidtranscriptsfig.s1;tables1).acrosstheentire dataset,onlyonesequencewasfoundthatterminatedinapossible3 polya)tail,andnone containedevidenceof5 spliced[leadersequences,orplausibletripartitetargeting sequencesconsistingofasignalpeptide,asafap[delimitedtransitpeptide,anda downstreamhydrophobicregion),asareassociatedwithnucleus[encoded,plastid[targeted proteinsinperidinindinoflagellatesnassouryandmorse2005;zhang,etal.2007)table S1). Limited-additional-coding-sequences-in-peridinin-plastids- Theentireassembledtranscriptomedatasetwasscreenedforfurthertranscriptsthatmight originatefromperidininplastids.wecouldnotfindconvincingevidenceforanyfurther genesthatareplastid[encodedinothernon[dinoflagellatelineages,andmightstillbe plastid[encodedwithinindividualdinoflagellates,beyondthetwelvephotosystemgenes previouslydocumentedfig.s2;supplementaryresults,section1).theentiredatasetwas additionallysearchedforhomologuesofthefourplastidgenesrpl28,-rpl33,-ycf16,-ycf24) proposedtohavebeengainedbyhorizontaltransferfrombacteriaintospecificperidinin plastidsmoszczynski,etal.2012).atotalof148newsequenceswereidentifiedand inspectedusingsingle[genephylogenies.whiletheoriginalpyrocystis-lunula-andceratiumhorridum-sequencesgroupedwithbacteroidetes,noneofthehomologueswithinthis datasetdid:instead,themajorityresolvedasamonophyleticgroupandallgroupedeither within,orassister[groupsto,otherplastidorcyanobacteriallineages,withrobust>90%) bootstrapsupportfig.s3[s6).noneoftherpl28,-rpl33,-ycf16-orycf24-transcriptsequences identifiedinthisstudypossesseda3 polyu)tailtables1);however,manycontained3' polya)tails,spliced[leadersequencestables1),andtripartiteplastidtargetingsequences Fig.S7),consistentwithanuclearorigin. Homologuesofthefournovelopenreadingframespreviouslyidentifiedonminicircles locatedintheamphidinium-carteraeplastidbarbrookandhowe2000;dorrellandhowe 2015)weresearchedforacrosstheentiretranscriptdataset.OnlyequivalentsofA.-carterae- ORF1,ORF2-andORF3-weredetected,andthesewerelimitedtotherelatedspecies Amphidinium-massartiiFig.S8,panelA).TheORF[likesequencesidentifiedinA.-massartiiwerehighlydivergentfromthoseofA.-carterae,withonly-46%ORF1),32%ORF2)and33% ORF3)betweenthetwosequences,comparedtoforexample)97%sequenceconservation betweenthepsbd-sequencesfromeachspeciesfig.s8,panela).wecouldfindonlylimited evidenceforthepresenceofadditionalconservedpolyuridylylatedtranscriptsthatmight correspondtonovelplastidorfswithinthedatasetfig.s8;supplementaryresults,section 1). 2.Reconstructionofphylogeneticrelationshipsbetweenperidinindinoflagellates Aconcatenatedproteinalignment3,410aminoacids,average72.7%pairwiseidentities) wasgenerated,consistingofthetwelveplastidsequencesstudied,foreachofthe dinoflagellatespresentinmmetsp,andareferencesetoffifteennon[dinoflagellatestable S3).BayesianandMaximumLikelihoodtreesweregeneratedfromthisalignmentFig.2; TableS4).TwophylogeneticallydistinctsetsofplastidsequenceswereidentifiedforcladeA Symbiodinium,oneofwhichresolvedwithotherSymbiodinium-species,andtheotherasa! 6!
sistergrouptoa.-carterae,whichpresumablyrepresentsacontaminationwithinthe Symbiodinium-AMMETSPlibraryFig.2;TableS1). Consistentwithpreviousdata,Amphidinium-wasidentifiedastheearliestdivergingperidinin dinoflagellategenusfig.2)dorrellandhowe2015;gavelis,etal.2015).following Amphidinium,-thenextmostbasaldinoflagellateswereTogula-jolla,andacladeconsistingof Prorocentrales,Peridiniales,andthepreviouslySuessialean)speciesPelagodinium-beii, whichdivergedfromacladeconsistingofgonyaulacales,suessiales,andthepreviously Peridinialean)genusHeterocapsa,withmoderatetorobustsupportinBayesiananalysis, >60%inMLtrees)Fig.2).IdenticalFig.S9)ornearlyidenticalFig.S10)topologiestothe originaltreewereobtainedintreescalculatedfromalignmentsfromwhichlongbranches, fastevolvingsites,orindividualplastidgeneshadbeenremovedsupplementaryresults, Section2),suggestingthattheinitialtreetopologywaslargelyaccurate. 3.Dinoflagellate?widechangesinplastidsequencecomposition Changes-in-plastid-GC-content-may-have-occurred-before-the-radiation-of-the-dinoflagellates- Thefirst,secondandthirdpositionGCcontentwerecomparedacrossa4,478ntgap[free alignmentofsixplastidgenespsaa,-psab,-psba,-psbb,-psbc,-psbd),foreachofthe dinoflagellatesstudied,andallofthenon[dinoflagellatesequencespreviouslyusedforthe multigenephylogenyfig.2).elevatedgccontentswereobservedatthirdcodonpositions inmanyofthedinoflagellatescomparedtonon[dinoflagellatespeciesfig.s11).these includedthirdpositiongccontentvaluesof>35%inthreeoftheearliestdiverging dinoflagellatecladesfig.s11;amphidiniumgc[3]=39.6[49.6%,peridinialesgc[3]=29.8[ 40.9%,ProrocentralesGC[3]=31.7[39.5%).However,thechromeridVitrella-brassicaformis, whichformstheclosestsister[grouptothedinoflagellateswithinthemultigenetreefig.2), alsohasahighthirdpositiongccontentfig.s11;46.9%),soitispossiblethatthisgc enrichmentisnotspecifictodinoflagellates. Limited-changes-to-plastid-translation-in-the-dinoflagellate-ancestor-- - Acrosstheentire4,478ntgap[freeplastidalignment,18codonsoccurredwithlower frequencyand19codonswithhigherfrequency)indinoflagellatescomparedtonon[ dinoflagellatespeciesone[wayanova,p<0.05;tables5;fig.s12).tenaminoacidswere likewisefoundtooccurwithlowerfrequency,and5withhigherfrequency,indinoflagellates one[wayanova,p<0.05;tables5;fig.s12)overthisalignment,asinferredbyastandard translationtable.wecouldnotidentifyanyconvincingevidenceforchangestoplastid translationtableswithinthedinoflagellatestables6;fig.s13;supplementaryresults, Section3). Amuchsmallernumberofcodonswerefoundtohaveundergonespecificde[enrichments 6)orenrichments8;chi[squaredtest,P<0.05;TableS5,Fig.S12)inacommonancestorof allstudieddinoflagellatesinferredbycomparingtheregressedancestralsequenceofall dinoflagellatesstudiedtothatoftheregressedancestralsequenceofthecommonancestor ofallstudieddinoflagellatesandvitrella-brassicaformis,whichwastheclosestsister[group todinoflagellatesincludedinthealignment-janouskovec,etal.2010)).comparingthetwo datasets,onlyfourcodonsata[ile,aga[arg,acc[thr,andtat[tyr)werefoundbothto occuratsignificantlyhigherfrequenciesindinoflagellatesthannon[dinoflagellates,andto haveundergoneaspecificenrichmentinthecommonancestorofallstudieddinoflagellates TableS5;Fig.S12).Similarly,onlyonecodonCGT[Arg)wasfoundtooccuratasignificantly! 7!
lowerfrequencyindinoflagellates,andtohaveundergoneaspecificde[enrichmentinthe commonancestorofallstudieddinoflagellatestables5;fig.s12).finally,onlyoneamino acidtyrosine)wasfoundtohaveundergoneasignificantchangeinfrequencyinthe commonancestorofallstudieddinoflagellatestables5;fig.s12),suggestingoverallthat relativelylimitedchangestoplastidcodonusageareassociatedwithdinoflagellateorigins. 4.Dinoflagellate?widechangestoplastidsequenceevolution Elevated-pairwise-substitutions-at-the-origin-of-dinoflagellates- - Totalnumbersofpairwisesubstitutionswerecalculatedforeveryspeciesusedinthe multigenephylogeny,overthe4,478nucleotidegap[freealignmentfig.3;tables7).the dinoflagellatesequenceswerehighlydivergentfromthenon[dinoflagellatespeciesfig.3; comparetoplefthandcorneroffiguretoremainder).onaverage,dinoflagellateandnon[ dinoflagellatespeciespairswereseparatedby1,549nucleotidesubstitutions,whilepairsof non[dinoflagellatespecieswereseparatedbyanaverageof994substitutions,whichwas significantlylowertables7;fig.s14,panela;one[wayanova,p=1.86x10 [179 ).Eventhe relatedalveolatelineagevitrella-brassicaformis,whichwasseparatedfromothernon[ dinoflagellatespeciesbyamuchlargernumberofsubstitutionsaveragevalue1276)was significantlylessdivergentthandinoflagellatespeciesstudiedtables7;fig.s14,panela;p= 1.2x10 [18 ),suggestingthatthiselevatedsubstitutionrateisspecificallyassociatedwith dinoflagellatespecies. Wetestedwhethertheelevatednumbersofsubstitutionsobservedbetweendinoflagellate andnon[dinoflagellatespecieswererelatedeithertoplastidgccontent,codonusage,or aminoacidcompositionfigs.s14,s15;supplementaryresults,section4).whilechangesto plastidgccontentandcodonusagewerecorrelatedtothetotalnumbersofpairwise substitutionsobservedfig.s15),alignmentrecodingtoremovetheseeffectsdidnot eliminatetheelevatedsubstitutionratesassociatedwithdinoflagellatespeciesfig.s14, panelb). Elevated-Ka/Ks-ratios-at-the-origin-of-dinoflagellates- - PairwiseKa/Ksratioswhichprovideinformationregardingthestrengthofselective pressureactingonindividualsequences,asexpressedbytheratioofnon[synonymousto synonymoussubstitutions)wereadditionallycalculatedforeachspeciespairfig.3;table S7).Similartothesituationobservedfortotalsubstitutionrates,muchhigherpairwiseKa/Ks ratioswereobservedbetweendinoflagellateandnon[dinoflagellatespeciespairsaverage value0.0592)thanwithinnon[dinoflagellatespeciespairsfig.s16,panela;averagevalue 0.0183;P=2.75x10 [111 ).AdramaticdifferencewasalsoobservedfortheKa/Ksratios calculatedbetweendinoflagellatesandnon[dinoflagellatespecies,andvitrella-andallother non[dinoflagellatespeciesfig.s16,panela;averagevalue0.0271;p=2.87x10 [18 ). WetestedwhethertheelevatedKa/Ksratiosobservedindinoflagellatesmightberelatedto theextremelyhightotalnumbersofpairwisesubstitutionsobserved,forexampleduetoa saturatingsubstitutionrateatcodonthirdpositionsleadingtoanunderestimateofthetotal synonymoussubstitutionsbetweendinoflagellateandnon[dinoflagellatespeciesfigs.s17[ S18;SupplementaryResults,Section5).Thethirdpositionsubstitutionratesbetween dinoflagellateandnon[dinoflagellatespecies,whilehigh,werenotatsaturationratefigs. S17,S18;SupplementaryResults,Section5).Othervariablestestedwereeithernot correlatedtothepairwiseka/ksratiosobtainedinthecaseofaminoacidcomposition),or! 8!
werenotsufficientjudgedbyalignmentrecoding)toexplainthedifferencesinka/ksratios observedinthecaseofthirdpositiongccontent,andcodonusage;figs.s16,s19; SupplementaryResults,Section6). 5.Lineage?specificchangestoperidininplastidsequences Extremely-elevated-Ka/Ks-ratios-within-dinoflagellates- SomedinoflagellatespecieswerefoundtohaveextremelyelevatedpairwiseKa/Ksratios calculatedrelativetootherdinoflagellatesinthealignmentfig.3).forexample, Pelagodinium-beiiwasseparatedfromallotherdinoflagellatesbyaminimumnumberof 1,400substitutions,andBrandtodinium-nutriculum-wasseparatefromallother dinoflagellatesbyaminimumnumberof1,399substitutionsfig.3;tables7),bothofwhich werefarlargerthantheaverageminimumnumberoftotalpairwisesubstitutions416) calculatedforotherdinoflagellatespeciesztest,p<0.05).theseseparationswerefoundto beindependentofplastidgccontentandcodonusagepatternsinbothspeciesfig.s20; SupplementaryResults,Section8).BothspecieswerealsofoundtohaveelevatedKa/Ks ratiosminimump.-beii-ka/ks0.0618;minimumb.-nutriculumka/ks0.0408;average dinoflagellateminimumka/ks0.0174;fig.3),butthesedifferenceswereeliminatedby removingcodonthirdpositionsfromka/kscalculations,suggestingthattheyaretheresult ofsaturatingmutationratesineachspeciesfig.s20;supplementaryresults,section8). AdramaticevolutionarydivergencewasobservedwithinmembersoftheGonyaulacalesand SuessialesFig.3,bottomrighthandsector;Fig.S21).Specieswithintheselineageshad extremelyhighpairwiseka/ksratios,withamaximumvalueof0.284betweenthe GonyaulacalesProtoceratium-reticulatum-andPyrodinium-bahamense,whichissignificantly greaterthantheaveragemaximumpairwiseka/ksratio0.140)observedforall dinoflagellates-fig.3;tables7;ztest,p<0.05).theaveragepairwiseka/ksratiobetween GonyaulacaleanandSuessialeandinoflagellates0.110)wassignificantlyhigherthanthe averagepairwiseka/ksratiobetweenotherdinoflagellatepairs0.043;fig.s21,panela; P=1.57x10 [09 ).Thiswasfoundtobeindependentofboththirdpositionsubstitutions,and changestocodonusageinthesespeciessupplementaryresults,section8).incontrast,the averagenumberofpairwisesubstitutionsbetweengonyaulacaleanandsuessialeanspecies 711)wassignificantlylowerP=0)thantheaveragepairwisesubstitutionfrequencies observedbetweenotherdinoflagellates1417;fig.3;fig.s21,panelb),indicatingthatthe rapiddivergentevolutionwithinthislineageisspecificallyduetoanelevatedka/ksratio. Widespread-evolution-of-alternative-translation-initiation-codons-in-peridinin-plastids- 44%ofallthesequencesidentifiedlackedaplausibleATGinitiationcodon,henceprobably usealternativetranslationinitiationcodonstables8).eightdifferentcodonswere identifiedasprobablealternativeinitiationsitesforindividualperidininplastidtranscripts, withttgandattoccurringthemostfrequentlyfig.4;tables8).noneofthealternative translationinitiationcodonsidentifiedwasconservedacrossalldinoflagellates,andthe majoritywerespecies[specificfig.4,panela).however,twentyalternativeinitiation codonswereconservedacrossmultipledinoflagellatespeciesfig.4,panelb,squarelabels), suchastheadoptionofattgalternativeinitiationcodonin-psba-inacommonancestorof Alexandrium,-Gambierdiscus,-Pyrodinium-andGonyaulax-sp.Fig.4,panelB,labelI;Fig.S22). Somelineagesutilisealternativeinitiationcodonsmorefrequentlythanothers,withsix alternativeinitiationcodonspetb[gtg,petd[ktg,psab[atw,psbc[atc,psbd[aty,psbe[! 9!
ATT)foundinScrippsiella-hangoei-anditssisterspeciesPeridinium-aciculiferum-Fig.4,panel B,labelsC,D),comparedtoonlyoneatpA[TTG)inAmphidinium-sp.Fig.4,panelB,labelA). Multiple-discrete-changes-to-plastid-protein-sequences-within-dinoflagellates Atotalof111insertionsand48deletionsdistributedacross80positionswereidentified withindinoflagellatesfig.4,panela;tables9).thiscontraststothesituationforthenon[ dinoflagellatespeciesinthealignment,forwhichonly19insertionsand22deletionswere identified-tables9).twelveinsertionsandfivedeletionswereconservedacrossall dinoflagellatesfig.4,panela;fig.s23,panela),althoughmanyoftheseindelshave undergonesubstantialexpansionsorcontractionsinindividualspeciesfig.s23,panelb). Themajorityofindels,however,wererestrictedtoindividualdinoflagellatespecies77 insertions,25deletions)orclades22insertions,18deletions)fig.4,panela;fig.s23, panelc).theseincludedtwoinsertionsinpsab,insertionstartingatconsensusresidue41; andpsbc,residue197)andonedeletioninpsbb,startingatconsensusresidue291)that evolvedinacommonancestorofallspeciesstudied,exceptforthebasallydivergent Amphidinium,andoneinsertionPsaB,residue168)andonedeletionPsaB,residue602)in acommonancestorofgonyaulacalesandsuessialesfig.4,panelb;triangularlabels).- Instancesinwhicharesiduethatisconservedinallotherplastidspeciesstudiedwaslostin dinoflagellatesweretabulatedforonerepresentativeprotein,theatpsynthasesubunit AtpA.200suchchangeswerefound,ofwhichonly15wereancestralFig.4,panelA;Table S10).Oftheremaining185substitutions,93werespecies[specific,while92wereshared acrossspecificdinoflagellateclades.theseclade[specificchangesincludedthelossof thirteenotherwiseconservedresiduesinacommonancestorofallgeneraexcept Amphidinium,-andtwoinacommonancestorofGonyaulacalesandSuessialesFig.4,panel B,circularlabels;TableS10). 6.Identificationofconsistenttrendsinperidininplastidevolution Photosystem-I-sequences-in-peridinin-plastids-have-elevated-Ka/Ks-ratios- - AnelevateddinoflagellateKa/KsratiowasobservedinthetwophotosystemIsubunitgenes psaa-andpsab)retained-indinoflagellateplastids.forpsaa,the-dinoflagellateka/ks0.481) was4.89timesthatofthenon[dinoflagellatevalue0.098);whereasforpsab-the dinoflagellateka/ks0.446)was4.47timesthatofthenon[dinoflagellatevalue0.099;fig. 5,panelA).BoththeKa/Ksratiosobservedweresubstantiallygreaterthanthoseobserved fordinoflagellatesequencesacrosstheentiredataset,bothintermsoftherawka/ksratio 0.296)andintermsoftherelativeenrichment2.41)comparedtothenon[dinoflagellate Ka/Ksratio0.123;Fig.5,panelA).ElevatedKa/Ksvalueswerealsoobtainedfor dinoflagellatepsaa-andpsab-genesinalignmentsrecodedtoeliminatethirdposition substitutionsandcodonusagebiasfig.s24;tabless11,s12[s15;supplementaryresults, Section8). Photosystem-I-sequences-in-the-dinoflagellate-ancestor-also-had-elevated-Ka/Ks-ratios IndividualKa/Ksratioswerealsocalculatedforeachsequence,solelybetweenthe hypotheticalsequencescalculatedforthecommonancestorofallstudieddinoflagellates, andthecommonancestorofdinoflagellatesandvitrella,whichshouldcorrespondtothe substitutionsthatmostprobablyoccurredindinoflagellatesimmediatelyfollowingtheir divergencefromotherplastidlineagestables11).elevatedka/ksratioswerefoundinboth! 10
thedinoflagellateancestorpsaa-ancestralka/ks0.425;dinoflagellateancestor/non[ dinoflagellateratio4.32)andpsab-sequencesancestralka/ks0.593;dinoflagellate ancestor/non[dinoflagellateratio5.95),comparedtoallothergenesancestralka/ks0.376; dinoflagellateancestor/non[dinoflagellateratio3.07;fig.5,panela;tabless11,s12).the Ka/KsratiosassociatedwiththepsaB-dinoflagellateancestorsequencewereconfirmed usingalignmentrecodingtobegenuine,ratherthanaconsequenceofahighlyelevated mutationrateorchangeincodonusagepreferencespecifictodinoflagellatephotosystemi genesfig.s24;supplementaryresults,section8). Elevated-Ka/Ks-ratios-are-specific-to-photosystem-I-genes- - Onlyoneothergene,psbI,wasfoundtohaveagreaterthanaverageKa/Ksratiobetween dinoflagellatesandnon[dinoflagellatesdinoflagellateka/ks0.451;non[dinoflagellateka/ks 0.132;dinoflagellate/non[dinoflagellateratio3.39)andbetweenthedinoflagellateancestor andnon[dinoflagellatesdinoflagellateancestorka/ks0.512;dinoflagellateancestor/non[ dinoflagellateratio3.84;fig.5,panela;tables11).however,theelevatedpsbi-ka/ksratio waseliminatedinalignmentsrecodedtoeliminatethirdpositionsubstitutionsfig.s24; SupplementaryResults,Section8),suggestingthatitistheresultofdueasaturating mutationrateatdinoflagellatepsbi-thirdcodonpositions.noothergenesexceptforpsaaandpsab)werefoundtohaveelevatedassociatedka/ksratios. Stromal-domains-of-peridinin-plastid-proteins-have-elevated-Ka/Ks-ratios- IndividualKa/Ksratioswerecalculatedforeachresiduewithineachplastidgene,for dinoflagellates,non[dinoflagellates,andthecommonancestorofallstudieddinoflagellates TableS11).ThedistributionofresidueswithelevatedKa/Kswasbiasedtowardsspecific regionsofindividualproteins.forexample,withinpsbb,59codonshaveelevatedassociated Ka/Ksratioseitherinthecommonancestorofallstudieddinoflagellatespositionsatthe centreofablockoftenresidueswithaggregateka/ks>1,andka/kssignificantlygreater thanthatcalculatedfortheequivalentnon[dinoflagellateresidue;z[test,p<0.05),orwithin thedinoflagellatessamecriteria,butwithka/ks>0.5;fig.s25).allofthesecodonsare predictedtoencodestromal[orluminal[facingpsbb-residuesfig.s25). Acrosstheentiredataset,171ofthe331residuesidentifiedtohaveelevatedKa/Ksratiosin thecommonancestorofallstudieddinoflagellateswerelocatedonpredictedstromalfaces ofplastidproteinsfig.5,panelb;fig.s26,panela;tabless11,s12).thisissignificantly greaterthanthenumber123)expectedthrougharandomdistributionofresidueschi[ squared,p=6.43x10 [10 ;TableS11).TheelevatedKa/Ksratiosonstromalresidueswerealso identifiedincalculationsperformedwithalignmentsrecodedtoeliminatethirdposition substitutionsandcodonusagebiasfig.s26;supplementaryresults,section9). Thesametrendswerenotdirectlyobservedwithinthedinoflagellates,wherethenumberof stromalresidueswithelevatedka/kswasinfactslightlyfewerthanexpected202/582 residues;expectednumber216;fig.s26,panelb;tabless11,s12).however,thismaybe influencedbythelowka/ksratiosobservedindinoflagellatesforatpa0.251)andatpb 0.254;Fig.5,panelA;TableS11),whicharetheonlytwoproteinsencodedinperidinin dinoflagellateplastidstobeentirelyextrinsictothethylakoidmembrane,henceentirely stromal[facingwalker2013).excludingatpaandatpb,thenumberofresidueswith elevatedka/ksindinoflagellatesthatarepredictedtofaceintotheplastidstromawas extremelygreaterthanexpected172/552residueswithelevatedka/ks;expectednumber ofresidues29;p=0;fig.s26,panelb;tabless11,s12).similartothedinoflagellateancestor,! 11
theenrichmentindinoflagellatestromalka/kswasconfirmedthroughalignmentrecoding tobegenuine,ratherthanaresultofmutationratesaturationorcodonusagebias SupplementaryResults,Section9). Discussion Wehaveidentifiedandanalysedpreviouslyunannotatedsequencesforplastid[encoded transcriptsinperidinindinoflagellates.thesedataconstituteoverthreetimesthenumberof previouslyannotatedperidinindinoflagellateplastidsequences,andincreasesthenumber ofcompleteperidininplastidprotein[codingdatasetsfivefoldfig.1).particularlylarge numbersofplastidtranscriptsequenceswereidentifiedfromthemmetsptranscriptome datasetsfig.1).thismaybeduetothepresenceofthe3'polyu)tailondinoflagellate plastidtranscripts,whichhaspreviouslybeenspeculatedtoenabletheenrichmentofplastid transcriptsequencesinpolya)enrichedrnalibrarieswangandmorse2006)suchasthose usedforgenerationofthemmetsplibrarieskeeling,etal.2014),andiscorroboratedby thepresenceofpolyu)tailsonmanyofthetranscriptsweidentifiedfig.s1;tables1). Almostallofthedinoflagellatesinvestigatedappeartopossessthesametwelveplastid[ encodedprotein[codinggenespreviouslyidentifiedhowe,etal.2008;barbrook,etal. 2014;Mungpakdee,etal.2014).Wefoundnoevidenceforrelocationofanyofthese sequencestothenucleusinanyspeciesfig.s1;tables1)ortheretentionofotherplastid[ derivedsequencesinanyperidininplastidfig.s2;tables2).weadditionallyfoundonly limitedevidenceforthepresenceoflaterallyacquiredgenesintheplastidsofperidinin dinoflagellates,orfortheconservationofothernovelorfspreviouslyidentifiedinindividual peridininplastidlineagesacrossmultiplespeciesfig.s8;tables2).whilewecannot formallyexcludethatotherplastidtranscriptstranscriptsthatdonotreceivea3'polyu) tailhenceareunlikelytobepresentinpolya)[enrichedlibraries)areproduced,ourdata indicatesthatthetwelvepreviouslyidentifiedprotein[codinggenesrepresentstheancestral protein[codingcomponentofdinoflagellateplastids. Thesedatahaveallowedustoproduceawell[resolvedreferencetreeforthebranching relationshipsbetweenmajorcladesofperidinindinoflagellatesfig.2;fig.s9;fig.s10). Severalnovelphylogeneticrelationshipswereuncovered.Forexample,Pelagodinium-beii- previouslygymnodinium-beii)-groupedwiththeprorocentraleswithreasonablesupport 100/64%)whereaspreviousstudiesbasedonsingle[genephylogeniesplaceditwithinthe SuessialesSiano,etal.2010;Decelle,etal.2014).Similarly,sister[grouprelationships betweenprorocentrumandtheperidiniales,andthemonophyleticcladeofgonyaulacales, SuessialesandHeterocapsa-Fig.2),havenottoourknowledgebeenpreviouslydescribed HoppenrathandLeander2010;Bachvaroff,etal.2014;Gavelis,etal.2015).Eachofthese novelrelationshipswerealsorecoveredusingmodifiedalignmentsfromwhichlong branches,individualgenes,gappedpositions,orfast[evolvingsiteswereremovedfig.s9; Fig.S10).Weaccordinglyconcludethattherelationshipsobtainedwithinourdatasetare probablygenuine,andnottheartifactoffastsequenceevolutionorrecentdiscretechanges toplastidtranslationtables,whichmightbiastheconceptualtranslationsobtained,but presumablyshouldberemovedinthefastsiteanalysis)inindividualdinoflagellateplastids. Wehavecorrelateddivergentchangestoperidininplastidtranscriptsequencestothe branchingrelationshipsfromthemultigenephylogeny,allowingustoinferwhenthese changesoccurred.fortheseanalyses,wehavefocusedexclusivelyonthesequencesand conceptualtranslationsofplastidtranscripts,whichprovideanunderstandingofthe aggregateconsquencesofdivergentgeneevolutionandtranscripteditingonperidinin! 12
plastidszauner,etal.2004;bachvaroff,etal.2006;dorrellandhowe2015).asmanyofthe sequencesidentifiedfrommmetsppossesspolyu)tailsfig.s1;table1),andthepresence ofthepolyu)tailisassociatedwiththecompletionoftranscripteditingindinoflagellate speciesdangandgreen2009;dorrell,etal.2016),wepresumethatthemajorityofthe sequencesidentifiedinthisstudyprobablyhavebeeneditedtocompletion,asopposedto representinguneditedprecursortranscripts. First,wehaveidentifiedchangestoperidininplastidsequencecompositionacrossthe dinoflagellates.theseincludechangestoplastidsequencegccontentandcodonusagefig. S11;Fig.S12),althoughmanyofthesechangesareeithersharedwithrelativessuchasthe elevatedthirdpositiongccontentinthechromeridvitrella;fig.s11),orappearlargelyto consistofchangesspecifictoindividualdinoflagellatelineagessuchasthechangesto plastidcodonusagefrequencies,whicharemuchmorenumerousinextantdinoflagellate speciesthanintheinferredsequencesoftheirlastcommonancestor;fig.s12).wecould notfindconvincingevidenceforchangestotheplastidtranslationtableacrossthe dinoflagellatesfig.s13).whilewecannotexcludealternativehypothesesforexample, independentincreasesinthirdpositiongccontentinthelineagesgivingrisetovitrella,and inacommonancestorofalldinoflagellates),wecannotfindsufficientevidenceformajor changestoplastidsequencecomposition,ortranslation,occurringinthedinoflagellate commonancestor. Next,wehaveidentifiedwidespreadchangestothetranslationproductsofplastid sequencesindinoflagellates.theseincludelargenumbersofsynonymoussubstitutions,and greatlyelevatedka/ksratiosinthedinoflagellatesfig.3;fig.s14;fig.s16)).theelevated Ka/KsratiossubstitutionsareunlikelytobeexplainedbychangesincodonpreferenceFig. S11;Fig.S16),orsaturationofsynonymoussubstitutionratesatthird[positionsitesin dinoflagellateplastidsfig.s17;fig.s18;fig.s19).itispossiblethatotherfactorsrelatedto sequencecompositione.g.,transitorychangesincodonpreferenceinindividual dinoflagellatelineages,orsaturationofsynonymoussubstitutionratesatfirst[andsecond[ positionsites)mayhavecontributedtotheelevatedka/ksratiosobserved;however,we suggestthatthemostparsimoniousexplanationfortheremainingelevatedka/ksratios observedindinoflagellateplastidsarechangesinplastidselectionpressurethroughouttheir evolution,followingtheirdivergencefromotherplastidlineages,andpriortotheradiation ofextantspecies.althoughpreviousstudieshavepositedchangesinselectionpressureon certainperidininplastidsequencesshalchian[tabrizi,etal.2006),thisistoourknowledge thefirstevidencethatindicatesthatselectiveeventshaveplayedawidespreadrolein dinoflagellateplastidevolution. Wehaveadditionallyidentifiedchangesthathaveoccurredinindividualdinoflagellate lineagessincetheirradiationfig.3;fig.4;figs.s20[s23).wehavefoundextremely elevatedka/ksratiosinpairwisecomparisonsbetweendinoflagellatesfig.3;figs.s20; S21),andmapmultipleacquisitionsofalternativetranslationinitiationcodonsFig.S22),in[ frameinsertionanddeletionsfig.s23),andthelossofotherwiseconservedatpa-residues tothedinoflagellatetreefig.4).theseeventshaveoccurredprogressively,withthe majorityofthedinoflagellatecladesbeingmarkedbydiscretechangestoplastidsequence Fig.4),andcontrastswiththemuchmoreconservativeevolutionobservedinotherplastid lineagestabless8,s9).thedivergentevolutionaryeventsobservedinthisstudyawait detailedbiochemicalcharacterisation.forexample,itwillbeinterestingtodetermine experimentallywhetherperidinindinoflagellatesutilisethealternativetranslationinitiation codonsidentifiedinthisstudy,orwhethertherearefurthertranslationinitiationcodons withweakersimilaritytoatginperidininplastids.thiscouldbeaccomplished,forexample,! 13
byproteomiccharacterisationofthen[terminiofperidininplastidproteinshuesgen,etal. 2013).Regardless,ourdatashowthatperidininplastidsequenceshavenotremainedstatic, buthavecontinuedtodivergefromoneanothertoaremarkableextent. Wehavefoundevidencethatdifferentperidininplastidlineageshaveevolvedindifferent mannerssincetheirradiation.forexample,weobserveelevatedminimumpairwise substitutionratesinsomeperidinialean/prorocentraleanspeciese.g.,brandtodinium,- Pelagodinium;Fig.3),whichiscorroboratedbytheratherlongbranchlengthsassociated withthesespeciesinthemultigenetreefig.2;fig.s9),andappearstoexplainatleast partiallyexplainthehighka/ksratiosobservedforthesespeciesfig.s20).wealsoobserve extremelyhighmaximumpairwiseka/ksratioswithinmembersofthegonyaulacalesand Suessiales,butincontrasttheseoccuralongsiderelativelylowpairwisesubstitutionrates, suggestingthattheyhaveresultedfromachangeinplastidselectivepressureinthese lineagesfig.3;fig.s21).similarly,wehavefoundthatspecificnodesontheperidinin plastidtreearemarkedbytheoriginsoflargenumbersofalternativeinitiationcodonsthe commonancestorofscrippsiella-hangoei-andperidinium-aciculiferum;fig.4),indelsthe commonancestorofamphidinium-andheterocapsa;fig.4),orchangestoconservedatparesiduesthecommonancestorofsymbiodinium;fig.4).itremainstobedeterminedwhy theplastidsofspecificdinoflagellatelineagesandnotothersareunusual,althoughwenote thatmanyofthemostdivergentspecieswithinourdatasethavesymbioticlifestrategies forexample,pelagodinium-isanendobiontofforaminiferanssiano,etal.2010),whereas BrandtodiniumisaradiolariansymbiontProbert,etal.2014)).Moretaxonomicallydetailed comparisonsofplastidevolutioninendosymbioticdinoflagellatestotheirfree[living relativesmayprovideinsightsintowhethersymbiosishasdrivendivergentplastidevolution withinthedinoflagellates. Moreglobally,itremainstoberesolvedwhytheperidinindinoflagellateshaveundergone suchdivergentplastidevolutioncomparedtootherlineages.notably,weshowthatresidues withelevatedka/ksratiosinperidininplastidsareconcentrated,bothinextant dinoflagellatesandinthedinoflagellatecommonancestor,onphotosystemisubunits,and stromalregionsofplastidproteins,whichcannotbeexplainedbychangestocodonusageor thirdpositionsubstitutionratefig.5;fig.s24;fig.s25;fig.s26).whiledivergentevolution haspreviouslybeenreportedinthephotosystemisequencesinindividualperidininplastid lineagesbachvaroff,etal.2006;shalchian[tabrizi,etal.2006;pochon,etal.2014),thisisto ourknowledgethefirstevidencethatthisbiasinevolutionaryeventsisconserved throughoutperidininplastids,fromtheoriginofdinoflagellatestoextantspecies. Similar,albeitlessextremedivergentevolutioneventstothosedescribedinthisstudyhave beenobservedinnucleargenesencodingplastid[targetedproteinsbachvaroff,etal.2006; Mungpakdee,etal.2014)andnon[plastidproteinsKim,etal.2011)inperidinin dinoflagellates.itwillbeinterestingtodetermineifthedivergentevolutioneventsobserved innucleus[encodedandplastid[encodedgenesarelinked;forexampleifthenucleus[ encodedcomponentsofdinoflagellatephotosystemi,ornucleus[encodedproteinslikelyto interactwiththestromalfacesofplastid[encodeddinoflagellateproteins,likewisepossess specificallyelevatedka/ksratios.conservedtrendsintheevolutionofnucleusandplastid[ encodedproteinsindinoflagellatesmightariseasaresultofcompensatoryevolutionevents tomaintainplastidphysiologyforexample,maintainingplastidredoxstateforplastid physiologyandgeneregulationallen1993;puthiyaveetil,etal.2008),orbalancingcyclic andlinearelectronflowtomeetthephysiologicalrequirementsofindividuallineages Reynolds,etal.2008),orasaresultofdivergentselectiononplastidproteinsto accommodatesomeofthemoreunusualnucleus[encodedproteinsandstructurespresent! 14
inperidininplastidsforexample,thedinoflagellatepyrenoidnassoury,etal.2005;siano, etal.2010),ortheperidininpigment[bindingproteincomplexeshaxo,etal.1976)). Ultimately,understandingtherelationshipsbetweenperidininplastidsequences,evolution andphysiologymayprovidevaluableinsightsintothebiologyofthisunusualand ecologicallyimportantlineage. MaterialsandMethods Assemblyandannotationofpreviouslyknownperidininplastidsequences Nucleotidesequencescorrespondingtothetwelveprotein[codinggenesatpA,-atpB,-petB,- petd,-psaa,-psab,-psba,-psbb,-psbc,-psbd,-psbe,-psbi)thatarefoundonthethreeeffectively completeperidininplastidgenomesamphidinium-carterae-ccap1102/6;symbiodinium-sp., CladeC3,andCladeMf)Nisbet,etal.2004;Barbrook,etal.2014;Mungpakdee,etal. 2014),wereassembledfromperidinindinoflagellateESTlibrarieswithinNCBIandthe MarineMicroeukaryoteTranscriptomeSequencingProjectMMETSP)Keeling,etal.2014). SequenceswereidentifiedbytBLASTnsearches,usingannotatedplastidproteinsequences fromperidinindinoflagellatesandtheircloserelativeslistedintables1).wherepossible, thepredictedtranslationproductsofplastidtranscriptsequenceswereused,insteadofthe translationproductsofplastidgdnasequences,duetothepresenceofplastidtranscript editinginsomedinoflagellateszauner,etal.2004;dorrellandhowe2015). Sequencesthatmatchedthequerieswithexpectvaluesofbelow1x10 [5 wereselected,and weresearchedusingblastxagainsttheentirencbidatabase.sequencesthatyieldedatop hitagainstanotherperidinindinoflagellatesequencewereretainedforsubsequentanalysis. Sequencesthatwereexcludedonthebasisofbeingofprobablenon[dinoflagellateorigin arelistedintables1.inthecaseofpsbi,forwhichperidinindinoflagellatesequencesare knowntobehighlydivergentnisbet,etal.2004),aninitialexpectvaluethreshold<1.0was usedtoidentifypossibledinoflagellateorthologues,andtheseorthologueswereusedin turnasquerysequencesforasecondroundofreciprocaltblastn/blastxsearchesagainst eachdataset,toidentifyeventhemostdivergentsequencespresent. ToinvestigatewhethernovelplastidORFs,andgenesproposedtohavebeenacquiredby lateraltransferfrombacteroidetesareconservedacrossperidininplastids,similarblast searcheswereperformedwithtranslationproductsofthefourpredictedorfsorf1,-orf2,- ORF3,-andORF4)describedintheAmphidinium-carterae-plastidgenomeBarbrook,etal. 2001;Nisbet,etal.2004;Barbrook,etal.2006),andtheproposedplastidrpl28-andrpl23 genesfrompyrocystis-lunula-andycf24-andycf16-sequencesfromceratium-horridum- Moszczynski,etal.2012).Forthesegenes,sequenceswereretainedifthetopreciprocal BLASThitwashomologoustothegenesinquestion,regardlessofevolutionaryaffinity.- SequencesthatpassedthereciprocalBLASTxsearchwereassembledintocontigsusing GeneIOUSv4.76Kearse,etal.2012).Eachnucleotidesequencewassearchedmanuallyfor possible5'splicedleadersequenceszhang,etal.2007),anduninterrupted3 polyt)and polya)tractsofmorethan3bplength,whichmightrespectivelycorrespondtotranscript polyu)andpolya)tailswangandmorse2006).thepredictedtranslationproductsofeach contigwereinspectedforthepresenceofpossibleplastid[targetingsequencesusingsignalp version3.0bendtsen,etal.2004)andasafindgruber,etal.2015),andformitochondria[ targetingsequencesusingtargetpversion1.1emanuelsson,etal.2007).annotatedcopies ofeachnovelperidininplastidtranscriptincludinginformationonthepresenceofpossible! 15
polyu)tails,polya)tails,spliced[leadersequences,andplastid targetingsequences)are includedintables1. Globalidentificationandannotationofperidininplastidtranscripts Todeterminewhetherfurther,previouslyundocumentedORFsareconservedacross peridininplastids,anindependent,top[downsearchwasperformedfortheentire dinoflagellatesequencelibrary.wefocusedontranscriptspossessinga3'polyu)tail,asthis featureisbelievedtobeuniquelyassociatedwithplastidtranscriptsindinoflagellates DorrellandHowe2012),andpolyuridylylatedtranscriptshavepreviouslybeenindicatedto beenrichedbypolya)selection,hencemaybepresentinpolya)[selectedtranscriptome datasetssuchasmmetspwangandmorse2006;keeling,etal.2014). Forthisanalysis,allsequencesthatterminatedateitherendinapotentialpolyT)sequence >3bp)fromallRNAlibrarieswereextracted,andfilteredtoremovesequencesthateither containedamoreplausiblereversecomplementofapolya)sequencelengthofpolya) sequence polyt)sequence),oraspliced[leader 6bpspliced[leadersequence)onthe othertranscriptend.parallellibrarieswereconstructed,containingalldinoflagellate sequencesthatbythesamecriteriapossessedeitherapolya)tail,orspliced[leader sequences,andtheremainingpolyt)[containingsequencesweresearchedagainstthese librariesusingtblastx.polyt)sequencesthatwerecontiguouswithotherpolya)or spliced[leadercontainingsequencesasjudgedbyblastbesthit;expectvalue<1x10 [5,> 90%similarities)werelikewiseremovedfromthedataset. Next,toidentifysequencesencodingproteinsofprobableplastidfunction,thefiltered polyt)[containingsequenceswereorientedsuchthatthepolyt)sequencewaslocatedon thetranscript3'end,andsearchedbyblastxagainstacompositeproteindataset, consistingofthecompleteproteinsequencesencodedinthenuclearandplastidgenomesof themodeldiatomphaeodactylum-tricornutumoudot[lesecq,etal.2007;bowler,etal. 2008),andalldinoflagellateplastidsequencespreviouslyidentifiedbytheBLASTsearches detailedabove.sequencesthatyieldedatophitinareverseorientationreadingframei.e. suchthatthepolyt)sequencecouldnotbelocatedonatranscript3'end),orwere identifiedtocorrespondtoaninternalregionofatranscriptjudgedifthe3'endofthe transcriptalignmentcorrespondedto<90%thefulllengthofthesubjectlength),that yieldedatophitagainstaknowndinoflagellateplastidprotein,orthatyieldedatophit againstaphaeodactylum-nucleus[encodedproteinexpectvalue<1x10 [5 )wereremoved. ThelongestORFsfromsequencesthatwerefoundtopossessplausible3'ends,andwere homologoustoplastid[encodedproteinsinphaeodactylum-thathavenotpreviouslybeen identifiedinperidinindinoflagellateplastidswereextracted,andsearchedforthepresence ofplastid[andmitochondria[targetingsequencesasabove.thesequencesforwhichno plausibleblasthitwasfoundandthusmaycontainnovelplastidorfs),weresearched againstoneanotherusingreciprocaltblastx/tblastxsearches.sequencesthatwerefound tomatchoneanotherwithareciprocalblasthitexpectvalueof<1x10 [5 andthusmight correspondtoconservednovelplastidproteins)wereretained,assembledintocontigs,and alignedusinggeneiousv4.76,asabove.thefinalannotationsforeachpolyt)[containing sequence,aswellasfullnucleotideandproteinsequencesforthepossiblenovelplastid ORFsidentified,arepresentedinTableS2. Phylogeneticanalysis! 16