Genome-wide analysis of ATP-binding cassette (ABC) proteins in a model legume plant, Lotus japonicus: comparison with Arabidopsis ABC protein family

Similar documents
Practical Bioinformatics

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Table S1: The Arabidopsis ABC superfamily: new nomenclature and pre-existing synonyms

SUPPLEMENTARY DATA - 1 -

Advanced topics in bioinformatics

Supplementary Information for

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

Number-controlled spatial arrangement of gold nanoparticles with

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

Crick s early Hypothesis Revisited

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

Electronic supplementary material

Supplementary Information

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Supplemental Figure 1.

Supporting Information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Protein Threading. Combinatorial optimization approach. Stefan Balev.

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Re- engineering cellular physiology by rewiring high- level global regulatory genes

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

The role of the FliD C-terminal domain in pentamer formation and

Codon Distribution in Error-Detecting Circular Codes

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

The Trigram and other Fundamental Philosophies

Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day

Genome Sequencing & DNA Sequence Analysis

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.

Near-instant surface-selective fluorogenic protein quantification using sulfonated

ydci GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC

Supplementary Information

Timing molecular motion and production with a synthetic transcriptional clock

From DNA to protein, i.e. the central dogma

Introduction to Molecular Phylogeny

part 3: analysis of natural selection pressure

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Supplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization.

Using algebraic geometry for phylogenetic reconstruction

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

Evidence for RNA editing in mitochondria of all major groups of

Insects act as vectors for a number of important diseases of

Chemical Biology on Genomic DNA: minimizing PCR bias. Electronic Supplementary Information (ESI) for Chemical Communications

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression

DNA Barcoding Fishery Resources:

Evolutionary Analysis of Viral Genomes

CSCI 4181 / CSCI 6802 Algorithms in Bioinformatics

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Protein-Protein Interactions of the Cytoplasmic Loops of the Yeast Multidrug Resistance Protein,PDR 5: A Two -Hybrid Based Analysis.

Genome-wide analysis of the MYB transcription factor superfamily in soybean

Biology 112 Practice Midterm Questions

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the

ANALYZING THE DIVERSITY OF A SMALL ANTIBODY MIMIC LIBRARY. Nick Empey. Chapel Hill 2010

How DNA barcoding can be more effective in microalgae. identification: a case of cryptic diversity revelation in Scenedesmus

Effects of plant root exudates on bacterial chemotaxis

L I F E S C I E N C E S

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

Lecture 15: Programming Example: TASEP

160, and 220 bases, respectively, shorter than pbr322/hag93. (data not shown). The DNA sequence of approximately 100 bases of each

Supplementary Materials for

BLAST. Varieties of BLAST

Supplemental Data. Chen and Thelen (2010). Plant Cell /tpc

The Cell Cycle & Cell Division. Cell Function Cell Cycle. What does the cell do = cell physiology:

CCHS 2015_2016 Biology Fall Semester Exam Review

7.06 Cell Biology EXAM #3 April 21, 2005

The Physical Language of Molecules

Symmetry Studies. Marlos A. G. Viana

CCHS 2016_2017 Biology Fall Semester Exam Review

It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.

Supplemental data. Vos et al. (2008). The plant TPX2 protein regulates pro-spindle assembly before nuclear envelope breakdown.

Motif Finding Algorithms. Sudarsan Padhy IIIT Bhubaneswar

Biology Final Test Review

Supporting Information

Reginald Kavishe Department of Pharmacology/Toxicology Radboud University Medical Centre NCMLS & Kilimanjaro Christian Medical Centre, Tanzania

Supplementary Figure 1 Characterization of wild type (WT) and abci8 mutant in the paddy field.

Introduction to protein alignments

Transcription:

DNA Research 13, 205 228 (2006) doi:10.1093/dnares/dsl013 Genome-wide analysis of ATP-binding cassette (ABC) proteins in a model legume plant, Lotus japonicus: comparison with Arabidopsis ABC protein family Akifumi Sugiyama, 1 Nobukazu Shitan, 1 Shusei Sato, 2 Yasukazu Nakamura, 2 Satoshi Tabata, 2 and Kazufumi Yazaki 1, * Laboratory of Plant Gene Expression, Research Institute for Sustainable Humanosphere, Kyoto University, Gokasho, Uji 611 0011, Japan 1 and Kazusa DNA Research Institute, 2 6 7, Kazusa-Kamatari, Kisarazu, Chiba, 292 0812, Japan 2 (Received 3 August 2006; revised 30 October 2006; published online 12 December 2006) Abstract ATP-binding cassette (ABC) proteins constitute a large family in plants with more than 120 members each in Arabidopsis and rice, and have various functions including the transport of auxin and alkaloid, as well as the regulation of stomata movement. In this report, we carried out genome-wide analysis of ABC protein genes in a model legume plant, Lotus japonicus. For analysis of the Lotus genome sequence, we devised a new method domain-based clustering analysis, where domain structures like the nucleotidebinding domain (NBD) and transmembrane domain (TMD), instead of full-length amino acid sequences, are used to compare phylogenetically each other. This method enabled us to characterize fragments of ABC proteins, which frequently appear in a draft sequence of the Lotus genome. We identified 91 putative ABC proteins in L. japonicus, i.e. 43 full-size, 40 half-size and 18 soluble putative ABC proteins. The characteristic feature of the composition is that Lotus has extraordinarily many paralogs similar to AtMRP14 and AtPDR12, which are at least six and five members, respectively. Expression analysis of the latter genes performed with real-time quantitative reverse transcription PCR revealed their putative involvement in the nodulation process. Key words: ABC protein; domain-based clustering analysis; genome-wide analysis; Lotus japonicus; SMC subfamily 1. Introduction ATP-binding cassette (ABC) proteins constitute one of the largest families in plants, with more than 120 members each in Arabidopsis and rice, which function as transporters, channel regulators and molecular switches. ABC proteins share highly conserved amino acid sequence domains designated nucleotide binding domains (NBDs). Each NBD contains three characteristic motifs of Walker A [GX 4 GK(ST)], Walker B boxes [(RK)X 3 GX 3 L (hydrophobic) 3 ] 1 and an ABC signature [(LIVMFY)S(SG)GX 3 (RKA)(LIVMYA)X(LIVFM) Communicated by Mikio Nishimura * To whom correspondence should be addressed. Tel. þ81-774-38-3617, Fax þ81-774-38-3623, E-mail: yazaki@rish.kyoto-u.ac.jp (AG)], 2 the last of which is situated between two Walker boxes. Most ABC proteins contain one or two transmembrane domains (TMDs), which generally consist of 4 6 transmembrane a-helices, while several members lacking TMD appear to be soluble proteins. The majority of eukaryotic members characterized so far are full-size ABC proteins, which contain two NBDs and two TMDs in a single polypeptide, either forming forward TMD1-NBD1-TMD2-NBD2 or reverse NBD1- TMD1-NBD2-TMD2 orientation. Those with one NBD and one TMD are referred to as half-size ABC proteins. Inventories of plant ABC proteins are available for Arabidopsis and rice, 3 5 whereas little is known about ABC proteins in an important family, Fabaceae. Fabaceae, which is composed of 700 genera and 20 000 Ó The Author 2006. Kazusa DNA Research Institute. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

206 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13,

No. 5] A. Sugiyama et al. 207 species, 6 represents the third largest plant family next to Orchidaceae and Asteraceae and has significant agricultural importance as dicots. A hallmark feature of legumes is their ability to obtain nitrogen-containing nutrients via symbiosis with soil microbes. This ability is important not only for agriculture but also for the environment as it replaces synthetic nitrogen fertilizers. Thus, studies of the mechanism of symbiotic nitrogen fixation (SNF) between fabaceous plants and rhizobia are of particular importance from agricultural and environmental viewpoints as well as for basic science on plant-microbe interactions. In this report we provide an inventory of ABC proteins in Lotus japonicus, a model legume, whose genome sequence information is available from the Lotus genome project of Kazusa DNA Research Institute. As an informatics approach, we have applied a new method domain-based clustering analysis for the phylogenetic analyses of NBDs and TMDs of ABC proteins. By comparing Lotus ABC proteins with those of Arabidopsis and rice, we have identified the features of Lotus ABC proteins, and demonstrated expression analysis for those characteristic genes of Lotus in their relevance to nodulation and SNF. 2. Materials and Methods 2.1. Domain-based clustering analysis of ABC proteins The amino acid sequences of all ABC proteins of Arabidopsis were divided into fragments of NBDs and TMDs. For NBD fragments, we used 220 amino acid sequences starting from 10 residues ahead of the consensus glycine of Walker A. TMDs of Arabidopsis ABC proteins were extracted from the web page ARAMEMNON (http://aramemnon.botanik.uni-koeln. de). 7 We used the predicted TMD regions of AtATH1, AtMDR1, AtMRP1, AtWBC1 and AtPDR1 as representatives of each subfamily ABCA 0, B, C and G, respectively, to define the TMD regions of the other members in respective subfamilies, where multiple alignments were made with the ClustalW program. 8 AtMRP6, 11, 15 are ABCC members that lack in the N-terminal extension T 0, and these are not found in the T 0 cluster in Fig. 1. NBDs of L. japonicus were extracted from Lotus fragments with ABC signature(s) as described above for the phylogenetic analyses of NBDs. Fragments containing <220 amino acids were also used if they have an ABC signature. The shortest fragment was 75 amino acids of Ljwgs 041501.2 in ABCC subfamily, but most of the fragments contained a large part of NBD. For the phylogenetic analyses of TMDs, Lotus fragments that show amino acid similarity to >50% of the TMD region of Arabidopsis counterparts were used. Multiple alignments were performed using ClustalW program with the default setting, and phylogenetic trees were viewed with TreeView. 9 2.2. Identification of ABC proteins in L. japonicus genome All putative amino acid sequences of L. japonicus available from the genome sequencing project of Kazusa DNA Research Institute were used as query sequences for BLASTP searches 10 against the proteome of A. thaliana. We employed L. japonicus proteins whose top hits in the BLAST search were ABC proteins for further analysis. Each Lotus protein was aligned with Arabidopsis counterparts using the ClustalW program to find consensus sequences of Walker A, Walker B and the ABC signature. 2.3. Identification of SMC proteins in the rice genome Arabidopsis SMC proteins were used as query sequences for BLASTP searches. 10 Putative SMC proteins of rice were aligned with Arabidopsis SMC proteins, and the presence of the Walker A box, Walker B box and the ABC signature was confirmed. 2.4. Plant materials Seeds of L. japonicus cv. MG-20 were scarified with sandpaper (no. 120), and surface-sterilized with 1% sodium hypochlorite for 10 min. Surface-sterilized seeds were sown on autoclaved vermiculite supplemented with NF medium 11 and germinated at 25 C under illumination with a 16/8 h (light/dark) photoperiod (120 mmol m 2 s 1 ) and 1-week-old seedlings were used for inoculation with symbiotic bacteria, Mesorhizobium loti strain Tono 12, which had been cultured in YEM medium 13 for 2 days at 25 C in the dark. Uninoculated plants were used as a negative control. Both inoculated and uninoculated plants were grown in a growth chamber at 25 C with 16/8 h (light/dark) photoperiod Figure 1. Phylogenetic relationship of NBDs and TMDs of Arabidopsis ABC proteins. The amino acid sequences of NBDs of all Arabidopsis ABC proteins and those of TMDs of membrane-localized ABC proteins in Arabidopsis were aligned using the ClustalW program as described in Materials and Methods. The nomenclature of Sanchez-Fernandez et al. 3 was used, and the abbreviations of ABC proteins are as follows: ATH, ABC-two-; ATM, ABC transporter of mitochondria; GCN, general control non-repressible; MDR, multi-drug resistance; MRP, multidrug resistance-associated protein; NAP, non-intrinsic ABC protein; PDR, pleiotropic drug resistance; PMP, peroxisomal membrane protein; RLI, RNase L inhibitor; SMC, structural maintenance of chromosome; TAP, transporter associated with antigen processing; WBC, whitebrown complex. N1 and N2 were referred to as NBD1 and NBD2, respectively, and T0, T1 and T2 represent TMD0, TMD1 and TMD2, respectively. ATH is ABCA 0, MDR, TAP and ATM are ABCB, MRP is ABCC, PMP is ABCD, RLI is ABCE, GCN is ABCF and WBC is ABCG in the systemic name used in the text.

208 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13, (120 mmol m 2 s 1 ); nodule formation was only observed in inoculated plants. After 3 weeks of inoculation, leaves, stems, roots and nodulated roots were harvested. These organs were immediately frozen in liquid nitrogen and kept at 80 C until the extraction of total RNA. 2.5. RNA isolation and real-time reverse transcription (RT) PCR Total RNA was isolated with the RNeasy Plant Mini-Kit (Qiagen, Valencia, CA) according to the manufacturer s instruction. Reverse transcription was done with SuperScript III reverse transcriptase (Invitrogen, CA), followed by incubation with RNase H (Invitrogen, CA). Real-time PCR reactions were performed with the Roter-Gene 3000A (Corbett Research, Australia), using Platinum SYBR Green qpcr SuperMix-UDG (Invitrogen, CA) according to the manufacturers instructions. Briefly, the PCR reaction mixture consisted of 10 ng of cdna template, 5 pmol of primers, 1 ml of fluorescent probe provided by the above kit and 12.5 ml of Platinum Quantitative PCR SuperMix-UDG in a total volume of 25 ml, and the standard reaction condition was as follows: 95 C for 10 min, 40 cycles of 95 C15s,55 C for 30 s, 72 C 30 s. The primers used to detect each mrna species are listed in Table 1. 3. Results and Discussion 3.1. Domain-based clustering analysis of ABC proteins The genome sequences of L. japonicus, available from the genome sequencing project of Kazusa DNA Research Institute, consist of sequences derived from transformation-competent artificial chromosome (TAC) clones, bacterial artificial chromosome (BAC) clones and those from whole genome shotgun sequencing. The estimated coverage is 70% (331 Mb) of the whole genome (470 Mb), 14 and these sequences will be publicly available soon. Since these sequences cover 90% of the expressed-sequence tags registered in the database (S. Sato et al., unpublished data), i.e. a large part of euchromatin region was sequenced, we can use these sequence data to analyze ABC protein family in a genome-wide manner. Our analysis in L. jaoponicus provides a valuable in silico tool for other researchers to obtain information from the data of other genome projects, which mainly aim to sequence the euchromatin region. As full-size ABC proteins are large, typically composed of >1200 amino acids, most of the putative open reading frames (ORFs) estimated in the genome sequence are fragmental, i.e. the full-length sequence was obtained in only limited members. Thus, we Table 1. Oligonucleotide primers used in this work Primer name Sequence (5 0-3 0 ) Ljwgs 008083.1 FW CTT GGC ATC AAT GAG GGA AT Ljwgs 008083.1 RV CAA TGT GAT TCT GCG GAC TG Ljwgs 041501 FW ATT GGA GCA CTG GAC AAA GG Ljwgs 041501 RV GTT GCT TCA TCC AGC ACC AA Ljwgs 051480.1 FW ATC GGC CTT CAT GAT TTG AG Ljwgs 051480.1 RV CAA CTG ACA CTT GCC GAG AA Ljwgs 147765.1 FW TGA TCA CAG TTG CAC ACA GG Ljwgs 147765.1 RV AGG TTC GTC GGC TCA TCA TA chr5. CM0456.6.2 FW GGA TAT TGG TGC TCG ATG AAG chr5. CM0456.6.2 RV GTG CAA TCC ATC ACA GTT GG Ljwgs 020627.1 FW GGG CAC ACA AAG ATC AAC CT Ljwgs 020627.1 RV CAG CTG GGT GGC TCT TAG AC Ljwgs 060957.1 FW TTT GCT GAA GCC TTC CAG TT Ljwgs 060957.1 RV TTA GCA GCT CCT TCC GGT TA Ljwgs 077747.1 FW TGA AGG CAG CAG CAC TAG AA Ljwgs 077747.1 RV CCC GAA ATA CCT CGA ATC AT Ljwgs 080010.1 FW TTC TCG CAG CCT TTC TTG AT Ljwgs 080010.1 RV GGT CCC AGA AGA TTG TTC CA Ljwgs 085739.1 FW GAA CCA ACT TCT GGG CTT GA Ljwgs 085739.1 RV AAT GCT AGG CTG ATG GAT GG chr3. CM0026.70 FW TTA CCG GCA GAG GTT GAT TC chr3. CM0026.70 RV GGG CAA ACC AAC AAG TGA GT chr3. CM0026.70.1 FW TGT TCA ATG GAC TGG CTG AG chr3. CM0026.70.1 RV CTG AGG ATC CAT GAG GCA AT chr3. CM0026.74 FW GAT GGT TGC GTC GCA GTT chr3. CM0026.74 RV CTA ACG TCT TTG GAA GTT GAA G Actin FW CAA CTG GGA CGA YAT GGA GA Actin RV GAG TCA TCT TCT CTC TGT TGG CC have devised a new method domain-based clustering analysis to classify the member composition of ABC proteins, in which amino acid sequence of NBDs and TMDs instead of the entire proteins are employed to cluster them. First, we applied this method to Arabidopsis ABC proteins to assess the validity of this method. Sanchez-Fernandez et al. 3 reported that Arabidopsis contains 129 ABC proteins, and van den Brule and Smart 15 later added two more pleiotropic drug resistance (PDR) members by detailed search of this subfamily in the Arabidopsis genome, resulting in 131 members of ABC proteins in Arabidopsis. In our analysis of their NBDs, however, it was revealed that 8 of the 131 members, i.e. AtATH8 (ABC-two- 8) (At2g39190), AtATH9 (At2g40090), AtATH10 (At4g01660), AtATH13 (At5g64940), AtNAP1 (nonintrinsic ABC protein 1) (At4g04770), AtNAP4 (A1g03900), AtNAP5 (A1g71330) and AtNAP6 (At1g32500), do not have consensus NBD sequences,

No. 5] A. Sugiyama et al. 209 including an ABC signature, despite being annotated as ABC proteins. Thus, these sequences were excluded from the domain-based clustering analysis of NBD in this study, and members of the structural maintenance of chromosome (SMC) subfamily, which have a largely separated NBD, were also not analyzed. We then extracted fragments of NBD and TMD from 119 sequences as described in Materials and Methods, which were named to reflect the position of the fulllength polypeptide sequence; for example, NBD1, NBD2, TMD1 and TMD2 of AtMDR1 were designated as AtMDR1-N1, AtMDR1-N2, AtMDR1-T1 and AtMDR1-T2, respectively. The first TMD of the members of the multi-drug-resistance-associated protein (MRP) subfamily was indicated as T0. A phylogenetic tree was constructed from all amino acid sequences of NBD (Fig. 1). In Arabidopsis, NBD1 and NBD2 of full-size ABC proteins as well as soluble ABC proteins having two NBDs like GCN (general control non-repressible) subfamily were separately clustered except for MDR and NAP subfamilies. Rice ABC proteins 5 also gave nearly identical relationship of NBD sequences. This suggests that in the MDR subfamily similarity between NBD1 and NBD2 is clearly higher than in other subfamilies, whereas NAP members are very divergent in their amino acid sequence. This phylogenetic relationship within NBD is nearly identical to that of full-length proteins. 3 We then applied this method to all amino acid sequences of TMDs to construct the phylogenetic tree (Fig. 1). It was clearly shown that each TMD1 and TMD2 of all full-size ABC protein subfamilies was clustered in an individual manner, including the MDR subfamily, indicating that the similarity among TMD1 of each full-size ABC protein is higher than the similarity between TMD1 and TMD2 in the same molecule. TMDs of half-size ABC transporters such as TAP (transporter associated with antigen processing) and ATM (ABC transporter of mitochondria) were also likely to be clustered together. These findings indicate that TMD1 and TMD2 sequences are also conserved individually enough to distinguish TMDs of ABC proteins from those of other membrane proteins in sequence-based y searches of Lotus genome. Comparing these phylogenetic trees to those of full-length polypeptides, 3 domain-based clustering analysis can be reasonably applied to fragments of L. japonicus ABC proteins. In MRPs, TMD0 was divided into two clusters, whereas TMD1 and TMD2 form each group. It is interesting that the split TMD0 clusters appear to reflect the predicted subcellular localization of either the vacuole or plasma membrane, indicating that TMD0 may be the determinant of the targeting membrane. These results indicate that each NBD or TMD represents the whole protein sequence, and therefore, domain-based clustering analysis can be used to classify an anonymous fragment of ABC proteins either with NBD or TMD obtained from the genome sequence into an appropriate subfamily, and can also be utilized to identify that the NBD or TMD belong to either N- or C-terminal flanking region. These results also strongly suggest that NBD1 of full-size ABC proteins are not equivalent to NBD2 except for MDR subfamily members, and that TMD1 of full-size ABC proteins are not equivalent to TMD2 either, even for the MDR subfamily. Frequent gene duplication of NBDs has been suggested from the phylogenetic analysis of NBDs of human ABC protein genes. 16 The phylogenetic analysis of Arabidopsis TMDs also provides an evidence for high duplication of TMDs in a similar manner as NBDs, i.e. several independent gene duplication events, because the diversity of two TMDs in a single polypeptide is higher than that within TMD1s or TMD2s in the subfamily. 3.2. Identification of ABC proteins in L. japonicus In the genome of L. japonicus, we found 394 ORFs that gave higher similarity to ABC proteins than to any other proteins of Arabidopsis. Among those 394 putative polypeptides, 112 ORFs contained at least one ABC signature (Table 2). TM, BM, CM and Ljwgs in Table 2 refer to TAC clone, BAC clone, contig of TAC and BAC clones and whole genome shotgun, respectively. The nomenclature of Sanchez-Fernandez et al. 3 has been widely used for plant ABC proteins by other plant researchers, 15,17,18 while Garcia et al. 5 used another nomenclature for rice ABC proteins. We employed in this report the nomenclature of human ABC proteins 16 that is generally accepted by most mammalian ABC protein researchers and more common for eukaryotic ABC proteins. Most subfamilies of plant ABC proteins have corresponding counterparts in the human genome, which are classified into eight subfamilies ABCA ABCG, whereas plant-specific subfamilies like PDR are named with conventional designation. 3.3. ABCA and ABCA 0 subfamily The plant ABCA subfamily has been considered to consist of full-size and half-size proteins, and the latter are also called the ATH subfamily, which stands for ABC-two-. As all human ABCA proteins are full-size, it may be confusing to use the conventional name ABCA for half-size members, and the name ABC two has not been commonly used in recent years. We would like to propose ABCA 0 subfamily for half-size members ATHs. By this classification, we can ordinary distinguish full-size ABCA members and sequentially similar half-size members from other subfamilies. Only one full-size ABCA member (AtABCA1) is present in the Arabidopsis genome, whereas no has been found in the rice genome. 3,18 In the genome sequences of L. japonicus, we found nine fragments similar to

210 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13, Table 2. Inventory of Lotus japonicus putative ABC proteins ORF name Amino acids Position ABC signature ABCA, full-size proteins Arabidopsis Common name Rice Ljwgs 011569.2 313 NBD1-TMD2 At2g41700 AtAOH1 Ljwgs 024406.1 139 TMD1 At2g41700 AtAOH1 Ljwgs 026106.1.1 118 TMD2 At2g41700 AtAOH1 Ljwgs 031377.1 71 NBD1 At2g41700 AtAOH1 AP003947 Ljwgs 039697.1 106 NBD1 At2g41700 AtAOH1 Ljwgs 047252.1 88 TMD2 At2g41700 AtAOH1 Ljwgs 058746.1 48 NBD2 At2g41700 AtAOH1 Ljwgs 058851.1 71 TMD2 At2g41700 AtAOH1 Ljwgs 115200.1 243 At2g41700 AtAOH1 ABCA 0, half-size proteins Ljwgs 047944.1 142 NBD At3g47730 AtATH1 AP003947 TM1130.25 966 TMD-NBD 1 At3g47730 AtATH1 AP003947 TM1130.15 87 NBD At3g47740 AtATH2 AP003947 TM1130.8.2 232 TMD At3g47740 AtATH2 AP003622 Ljwgs 120677.1.1 172 TMD At3g47750 AtATH3 AP003622 Ljwgs 028818.1 134 TMD At3g47770 AtATH5 AP003622 Ljwgs 026271.2 300 TMD At3g47780 AtATH6 AP003947 Ljwgs 063138.1 103 TMD At3g47780 AtATH6 AP005508 Ljwgs 063597.1 147 TMD At3g47780 AtATH6 AP005508 Ljwgs 118121.1 65 NBD At3g47780 AtATH6 AP003947 TM1130.28 43 NBD At3g47780 AtATH6 AP003947 TM1130.29 738 TMD-NBD 1 At3g47780 AtATH6 AP003947 TM1130.8 188 TMD At3g47780 AtATH6 AP003947 TM1130.8.1 257 TMD At3g47780 AtATH6 AP003947 Ljwgs 020284.1 299 At2g39190 AtATH8 AP004240 Ljwgs 032468.1 167 At2g39190 AtATH8 AC069158 Ljwgs 058241.1 93 At2g39190 AtATH8 AC069158 Ljwgs 028190.1 464 At2g39190 AtATH8 Ljwgs 041099.2 356 At2g39190 AtATH8 AC134346 Ljwgs 063588.1 162 At2g39190 AtATH8 AP005116 chr4.cm0247.13 427 At2g39190 AtATH8 AF480497 chr5.cm0072.29 656 At2g39190 AtATH8 AL606455 chr6.cm0686.12 779 At2g39190 AtATH8 AP005111 Ljwgs 023381.2 185 At2g40090 AtATH9 Ljwgs 024227.1 334 At2g40090 AtATH9 AF480497 Ljwgs 041635.1 226 At2g40090 AtATH9 AP005764 chr5.cm0048.9 431 At4g01660 AtATH10 chr6.tm0722.14.2 95 At4g01660 AtATH10 AP005905 Ljwgs 011402.1 182 TMD-NBD At5g03910 AtATH12 AL662945 Ljwgs 028695.1 252 TMD-NBD At5g03910 AtATH12 AL662945 Ljwgs 005810.1 564 At5g64940 AtATH13 AP005116 Ljwgs 019322.1 168 At5g64940 AtATH13 AP005116 Ljwgs 036003.2 64 At5g64940 AtATH13 AP005116 Ljwgs 058119.1 66 At5g64940 AtATH13 AP005116

No. 5] A. Sugiyama et al. 211 Table 2. continued ORF name Amino acids Position ABC signature ABCB, full-size proteins Arabidopsis Common name Rice Ljwgs 041935.1 124 NBD1 1 At2g36910 MDR1 AJ535058 Ljwgs 052197.1 210 At2g36910 MDR1 AC126223 Ljwgs 057472.1 160 TMD1 At2g36910 MDR1 AP004623 Ljwgs 087515.1 200 TMD1 At2g36910 MDR1 AJ535058 Ljwgs 113075.1 121 NBD1 At2g36910 MDR1 AJ535058 Ljwgs 133042.1 150 TMD2 At2g36910 MDR1 AJ535058 chr2.cm0346.21 141 NBD1 At2g36910 MDR1 AJ535058 Ljwgs 021469.1 460 TMD2-NBD2 At4g25960 AtMDR2 AJ535061 Ljwgs 032156.1 93 TMD2 At4g25960 AtMDR2 AJ535061 Ljwgs 049565.1 165 TMD1 At4g25960 AtMDR2 AJ535061 Ljwgs 147583.1 75 TMD1 At4g25960 AtMDR2 AJ535061 Ljwgs 016211.1 222 NBD2 1 At4g01820 AtMDR3 AJ535068 Ljwgs 147232.1 132 NBD1 1 At4g01820 AtMDR3 AJ535066 Ljwgs 001709.1 1106 TMD1-NBD1-TMD2 1 At2g47000 AtMDR4 AJ535067 Ljwgs 019419.1 633 NBD1-TMD2 1 At2g47000 AtMDR4 AJ535067 Ljwgs 044163.1 438 NBD1 1 At2g47000 AtMDR4 AJ535067 Ljwgs 072298.1 336 TMD1-NBD1 1 At2g47000 AtMDR4 AJ535067 chr1.tm0637.22 355 TMD1-NBD1 At2g39480 AtMDR6 AJ535062 chr1.tm0637.4 155 TMD1 At2g39480 AtMDR6 AJ535062 chr1.tm0637.8 368 TMD2 At2g39480 AtMDR6 AJ535062 chr5.cm0260.59.1 985 TMD1-NBD1-TMD2 At2g39480 AtMDR6 AJ535062 Ljwgs 011048.2 402 TMD1 At1g02520 AtMDR8 AJ535067 Ljwgs 011357.1 391 TMD1 At1g02520 AtMDR8 AJ535067 Ljwgs 020628.1 186 TMD1 At1g02520 AtMDR8 AJ535067 Ljwgs 028930.1 328 TMD1 At1g02520 AtMDR8 AJ535067 Ljwgs 032552.1 307 TMD1 At1g02520 AtMDR8 AJ535068 Ljwgs 095176.1 258 TMD2-NBD2 At1g02520 AtMDR8 AJ535067 Ljwgs 098535.1 290 TMD2-NBD2 At1g02520 AtMDR8 AJ535067 chr1.cm0122.17 125 TMD1 At1g02520 AtMDR8 AJ535065 chr1.cm0349.47 261 TMD1 At1g02520 AtMDR8 AJ535067 chr6.cm0118.16 398 TMD1 At4g18050 AtMDR9 AJ535064 chr6.cm0118.6 1251 TMD1-NBD1-TMD2-NBD2 2 At4g18050 AtMDR9 AJ535064 chr6.cm0118.7 689 TMD2-NBD2 1 At4g18050 AtMDR9 AJ535064 Ljwgs 030966.1 93 NBD2 At1g10680 AtMDR10 AJ535061 Ljwgs 037651.1.1 52 NBD2 At1g10680 AtMDR10 AJ535061 Ljwgs 023222.1 100 TMD1 At3g28860 AtMDR11 AJ535057 Ljwgs 052502.1 386 NBD1 1 At3g28860 AtMDR11 AJ535059 Ljwgs 089157.1 342 TMD2-NBD2 At3g28345þAt3g28344 AtMDR13 AJ535069 chr2.cm0081.24.3 225 TMD1 At3g28345þAt3g28344 AtMDR13 AP005849 chr2.cm0081.24.4 159 TMD1 At3g28345þAt3g28344 AtMDR13 AP005849 chr2.cm0081.40 99 TMD1 At3g28345þAt3g28344 AtMDR13 AP005849 chr3.bm1543.1 419 TMD2-NBD2 1 At3g28345þAt3g28344 AtMDR13 AP005849 chr3.bm1543.3 396 TMD1-NBD1 At3g28345þAt3g28344 AtMDR13 AP005849 Ljwgs 024866.1 444 NBD1-TMD2 At3g55320 AtMDR14 AJ535062 chr1.tm0637.6.1 396 TMD2-NBD2 At3g55320 AtMDR14 AJ535062

212 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13, Table 2. continued ORF name Amino acids Position ABC signature Arabidopsis Common name Rice CM1324.10.2 1229 TMD1-NBD1-TMD2-NBD2 1 At1g27940 AtMDR15 AJ535059 Ljwgs 001709.2 391 TMD2-NBD2 1 At3g62150 AtMDR17 AJ535068 Ljwgs 007903.1 913 TMD1-NBD1-TMD2-NBD2 At3g62150 AtMDR17 AJ535068 Ljwgs 025623.1 499 TMD2-NBD2 1 At3g62150 AtMDR17 AJ535067 Ljwgs 122328.1 161 NBD2 1 At3g62150 AtMDR17 AJ535068 Ljwgs 125461.1 245 NBD2 1 At3g62150 AtMDR17 AJ535068 TM1070.2 173 NBD1 At3g62150 AtMDR17 AJ535067 TM1070.8 72 NBD2 At3g62150 AtMDR17 AJ535067 chr3.bm1543.1.1 197 NBD1-TMD2 1 At3g62150 AtMDR17 AJ535069 Ljwgs 043188.1 249 NBD1 1 At3g28360 AtMDR18 AJ535055 chr2.cm0081.24.1 90 NBD2 At3g28360 AtMDR18 AP005849 Ljwgs 012642.1 494 TMD1-NBD1 At3g28380 AtMDR19 AL606614 chr2.cm0081.24.2 878 NBD1-TMD2-NBD2 2 At3g28380 AtMDR19 AP005849 chr2.cm0081.38 462 NBD1-TMD2-NBD2 1 At3g28380 AtMDR19 AP005849 chr2.cm0065.29 1229 TMD1-NBD1-TMD2-NBD2 2 At3g28390 AtMDR20 AJ535055 ABCB, half-size proteins Ljwgs 010586.1 152 TMD At5g58270 AtATM3 AP000391 chr1.cm0001.8 324 TMD-NBD 1 At1g70610 AtTAP1 AP003436 chr1.cm0001.8.1 342 TMD At1g70610 AtTAP1 AY013245 chr3.cm0106.33.1 129 NBD At1g70610 AtTAP1 AP003436 chr4.cm0500.31 650 TMD-NBD 1 At5g39040 AtTAP2 AY013245 ABCC, full-size proteins Ljwgs 060046.1.1 137 NBD1 1 At1g30400 AtMRP1 AL662970 Ljwgs 103181.1 133 NBD2 1 At1g30400 AtMRP1 AL662970 TM1408.7.1 261 At1g30400 AtMRP1 AL662970 Ljwgs 029227.1 221 TMD1 At2g34660 AtMRP2 AL662970 Ljwgs 034974.1 369 TMD2-NBD2 At2g34660 AtMRP2 AL662970 Ljwgs 036703.1 256 TMD2-NBD2 At2g34660 AtMRP2 AL662970 Ljwgs 047950.1 138 NBD2 At2g34660 AtMRP2 AL662970 Ljwgs 069801.1 220 NBD2 1 At2g34660 AtMRP2 AL662970 Ljwgs 115556.2 106 NBD1 At2g34660 AtMRP2 AL662970 TM1408.16.3 55 TMD1 At2g34660 AtMRP2 AL662970 TM1408.6.1 121 NBD2 1 At2g34660 AtMRP2 AL662970 Ljwgs 013989.1.1 455 TMD2-NBD2 1 At3g13080 AtMRP3 AP003215 Ljwgs 024210.1 387 TMD2-NBD2 1 At3g13080 AtMRP3 AP003215 Ljwgs 034294.1 332 At3g13080 AtMRP3 AP003215 Ljwgs 140594.1 171 At3g13080 AtMRP3 AJ535080 Ljwgs 147740.0.1 69 NBD1 At3g13080 AtMRP3 Ljwgs 032681.1 340 TMD2-NBD2 1 At2g47800 AtMRP4 AJ535074 Ljwgs 041352.1 348 TMD2 At2g47800 AtMRP4 AJ535074 Ljwgs 068839.1.1 29 NBD2 At2g47800 AtMRP4 AJ535073 Ljwgs 071735.1 409 TMD0-TMD1 At2g47800 AtMRP4 AJ535074 Ljwgs 074216.1 83 NBD2 1 At2g47800 AtMRP4 AJ535074 Ljwgs 075586.1 187 TMD2 At2g47800 AtMRP4 AJ535074 Ljwgs 005728.1 46 TMD2 At1g04120 AtMRP5 AJ535076 Ljwgs 079161.1 259 TMD2-NBD2 At1g04120 AtMRP5 AJ535076

No. 5] A. Sugiyama et al. 213 Table 2. continued ORF name Amino acids Position ABC signature Arabidopsis Common name Rice Ljwgs 131851.1 222 NBD2 1 At1g04120 AtMRP5 AJ535080 TM0639.1 637 TMD0-TMD1-NBD1 1 At1g04120 AtMRP5 AJ535079 TM0639.11 71 NBD2 At1g04120 AtMRP5 AJ535076 TM0639.16 536 NBD1-TMD2 1 At1g04120 AtMRP5 AJ535079 TM0639.17 143 NBD2 1 At1g04120 AtMRP5 AJ535079 CM0355.29 1353 TMD1-NBD1-TMD2-NBD2 2 At3g21250 AtMRP6 AC135427 Ljwgs 041356.1 300 TMD1-NBD1-TMD2-NBD2 At3g21250 AtMRP6 AJ535081 Ljwgs 116942.1 274 NBD1 1 At3g21250 AtMRP6 AC135427 Ljwgs 131098.1 299 TMD1-NBD1 At3g21250 AtMRP6 AC135427 Ljwgs 008548.3 845 TMD1-NBD1-TMD2-NBD2 1 At3g60160 AtMRP9 AL606658 Ljwgs 073928.1 201 TMD2-NBD2 At3g60160 AtMRP9 AJ535080 Ljwgs 020100.1 440 TMD0-TMD1 At3g62700 AtMRP10 AJ535074 Ljwgs 024575.1 131 NBD2 1 At3g62700 AtMRP10 AJ535074 Ljwgs 063395.1 210 TMD2 At3g62700 AtMRP10 AJ535074 Ljwgs 069797.1 120 At2g07680 AtMRP11 TM0845.8 1030 TMD0-TMD1-NBD1-TMD2-NBD2 2 At2g07680 AtMRP11 AP005919 TM0845.8.1 100 TMD0 At2g07680 AtMRP11 AP005919 Ljwgs 130515.1 82 TMD1 At1g30420 AtMRP12 AL662970 Ljwgs 037518.1 150 NBD1 1 At1g30410 AtMRP13 AL662970 Ljwgs 064080.1.1 86 At1g30410 AtMRP13 AL662970 Ljwgs 087711.1 109 TMD0 At1g30410 AtMRP13 AL662970 TM1408.16.2 35 TMD1 At1g30410 AtMRP13 AL662970 TM1408.21.1 90 TMD1 At1g30410 AtMRP13 AL662970 TM1408.4 110 TMD1 At1g30410 AtMRP13 AL662970 TM1408.6 589 NBD1-TMD2-NBD2 1 At1g30410 AtMRP13 AL662970 Ljwgs 008083.1 115 NBD2 1 At3g59140 AtMRP14 AC112209 Ljwgs 013404.1 528 At3g59140 AtMRP14 AC112209 Ljwgs 018731.1 632 TMD0-TMD1-NBD1 At3g59140 AtMRP14 AC112209 Ljwgs 021713.1 464 TMD1-NBD1-TMD2 At3g59140 AtMRP14 AP005828 Ljwgs 027617.1 127 NBD1 At3g59140 AtMRP14 AP005828 Ljwgs 041501.2 75 NBD2 1 At3g59140 AtMRP14 AP005828 Ljwgs 051481.1 283 TMD2-NBD2 1 At3g59140 AtMRP14 AC112209 Ljwgs 070176.1 215 TMD2-NBD2 At3g59140 AtMRP14 AP005828 Ljwgs 075329.1 223 TMD2-NBD2 At3g59140 AtMRP14 AP005828 Ljwgs 087804.1 254 NBD2 1 At3g59140 AtMRP14 AP005828 Ljwgs 111461.1 91 TMD2 At3g59140 AtMRP14 AC112209 Ljwgs 118475.1 251 At3g59140 AtMRP14 AC112209 Ljwgs 147765.1 113 TMD2 1 At3g59140 AtMRP14 AP005828 Ljwgs 150784.1 137 TMD1 At3g59140 AtMRP14 AP005828 TM0631.14.1 134 At3g59140 AtMRP14 AP005828 TM1746.16.1 41 NBD2 At3g59140 AtMRP14 AC112209 chr5.cm0456.12 35 NBD2 At3g59140 AtMRP14 AC112209 chr5.cm0456.4 153 TMD2 At3g59140 AtMRP14 AP005828 chr5.cm0456.4.1 234 NBD1-TMD2 At3g59140 AtMRP14 AP005828 chr5.cm0456.6.1 834 TMD0-TMD1-NBD1-TMD2 At3g59140 AtMRP14 AC112209 chr5.cm0456.6.2 437 TMD2-NBD2 1 At3g59140 AtMRP14 AP005828

214 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13, Table 2. continued ORF name Amino acids Position ABC signature Arabidopsis Common name Rice chr5.cm0456.7 291 At3g59140 AtMRP14 AC112209 ABCD, full-size proteins Ljwgs 020913.2 110 NBD2 At4g39850 AtPMP2 AJ535082 Ljwgs 069382.1 81 TMD1 At4g39850 AtPMP2 AJ535082 Ljwgs 098798.1 145 NBD2 At4g39850 AtPMP2 AP004365 ABCD, half-size proteins Ljwgs 080277.1.1 206 At1g54350 AtPMP1 chr3.cm0996.4.2 56 NBD At1g54350 AtPMP1 AP002861 chr3.cm0996.8 320 TMD-NBD At1g54350 AtPMP1 AP002861 chr3.cm0996.9 263 TMD At1g54350 AtPMP1 AP002861 ABCE, half-size proteins Ljwgs 027813.1 273 At4g19210 AtRLI2 AY093583 Ljwgs 030430.0.1 198 At4g19210 AtRLI2 AY093583 chr2.cm0803.109 163 NBD1 At4g19210 AtRLI2 AY093583 chr2.cm0803.112 602 NBD1-NBD2 2 At4g19210 AtRLI2 AY093583 chr2.cm0803.122.1 47 At4g19210 AtRLI2 AE016959 chr2.cm0803.130 119 At4g19210 AtRLI2 AE016959 chr4.cm1170.68 223 At4g19210 AtRLI2 AY093583 ABCF, half-size proteins Ljwgs 005207.1 251 NBD1 1 At5g60790 AtGCN1 AP004623 Ljwgs 021611.1 292 NBD2 1 At5g60790 AtGCN1 AP004623 Ljwgs 128679.1 86 At5g60790 AtGCN1 AP006162 chr3.cm0070.21 597 NBD1-NBD2 2 At5g60790 AtGCN1 AP004623 Ljwgs 145734.1.1 76 NBD2 At5g09930 AtGCN2 Ljwgs 037511.1 139 At1g64550 AtGCN3 AP004776 Ljwgs 051124.1 175 At1g64550 AtGCN3 AP004776 Ljwgs 067287.1 55 At1g64550 AtGCN3 AP004776 Ljwgs 150406.1 71 At1g64550 AtGCN3 AP004776 chr4.cm0006.100 73 At1g64550 AtGCN3 AP004776 chr4.cm0006.94 73 At1g64550 AtGCN3 AP004776 Ljwgs 020369.1 151 At3g54540 AtGCN4 AC093180 Ljwgs 108453.1 341 NBD1-NBD2 1 At3g54540 AtGCN4 AC093180 Ljwgs 111906.1 337 NBD2 1 At3g54540 AtGCN4 AC093180 Ljwgs 114078.1 277 NBD2 1 At3g54540 AtGCN4 AC093180 TM1678.3 77 At3g54540 AtGCN4 AC093180 TM1678.3.1 485 NBD2 At3g54540 AtGCN4 AC093180 TM1746.20 710 NBD1-NBD2 1 At3g54540 AtGCN4 AC093180 chr1.cm0032.57 698 NBD1-NBD2 2 At5g64840 AtGCN5 ABCG, half-size proteins Ljwgs 017892.1.1 71 TMD At2g39350 AtWBC1 AP003271 chr5.cm0260.23 751 NBD-TMD 1 At2g39350 AtWBC1 AC084405 TM0445.17 715 NBD-TMD At2g37360 AtWBC2 AP005124 TM0445.31 68 TMD At2g37360 AtWBC2 Ljwgs 040944.1 187 TMD At2g28070 AtWBC3 AC092263 Ljwgs 042985.1 77 NBD At2g28070 AtWBC3 AC092263 Ljwgs 118279.2 202 NBD 1 At2g13610 AtWBC5 AC084405

No. 5] A. Sugiyama et al. 215 Table 2. continued ORF name Amino acids Position ABC signature Arabidopsis Common name Rice Ljwgs 032142.1 306 TMD At5g13580 AtWBC6 AP001111 Ljwgs 023608.2 102 NBD 1 At2g01320 AtWBC7 AP003046 Ljwgs 038224.1 99 TMD At2g01320 AtWBC7 AP003046 Ljwgs 061340.1 178 TMD At2g01320 AtWBC7 AP003046 Ljwgs 012396.1 541 NBD-TMD 1 At5g52860 AtWBC8 AP003527 chr3.cm0590.31 595 NBD-TMD 1 At4g27420 AtWBC9 AP005605 chr3.cm0111.66.5 616 NBD-TMD 1 At1g53270 AtWBC10 AP002866 Ljwgs 022358.2 183 TMD At1g17840 AtWBC11 AC068950 Ljwgs 024129.1 382 NBD 1 At1g17840 AtWBC11 AP005573 Ljwgs 025750.1 388 TMD At1g17840 AtWBC11 AP005573 Ljwgs 055522.1 240 TMD At1g17840 AtWBC11 AC068950 Ljwgs 070377.1 248 At1g17840 AtWBC11 AP005573 Ljwgs 074169.1 229 NBD 1 At1g17840 AtWBC11 AP005573 Ljwgs 076115.2 141 NBD At1g17840 AtWBC11 AL662976 Ljwgs 139032.1 161 TMD At1g17840 AtWBC11 AP005573 TM1051.29 129 NBD 1 At1g17840 AtWBC11 AP005573 TM1051.32 370 At1g17840 AtWBC11 AP005573 Ljwgs 018199.1 324 NBD-TMD At1g51500 AtWBC12 AL662976 Ljwgs 045928.2 87 NBD At1g51500 AtWBC12 AP005573 Ljwgs 083434.1 183 TMD At1g51500 AtWBC12 AC135422 TM1051.11 65 NBD At1g51500 AtWBC12 AP005573 Ljwgs 112871.1 135 NBD At1g31770 AtWBC14 AP005605 chr2.cm0803.115 656 NBD-TMD 1 At1g31770 AtWBC14 AP005605 chr5.cm0311.20 650 NBD-TMD At1g31770 AtWBC14 AP005605 Ljwgs 016548.1 351 NBD-TMD 1 At3g21090 AtWBC15/AtWBC22 AC135422 Ljwgs 116062.1.1 174 NBD-TMD At3g21090 AtWBC15/AtWBC22 AC135422 Ljwgs 120471.1 182 NBD 1 At3g21090 AtWBC15/AtWBC22 AC135422 TM1051.33 106 NBD At3g21090 AtWBC15/AtWBC22 AP005573 Ljwgs 047281.2 137 NBD 1 At3g55090 AtWBC16 AP003271 chr5.tm1323.17.1 88 NBD At3g55090 AtWBC16 AC084405 Ljwgs 020780.1 127 NBD At3g55100 AtWBC17 AP005124 chr2.cm0177.41 717 NBD-TMD 1 At3g55130 AtWBC19 AP005124 Ljwgs 015205.1 156 NBD At3g53510 AtWBC20 AC084405 Ljwgs 048672.1 262 NBD 1 At3g25620 AtWBC21 AP005605 chr2.cm0749.1.1 232 NBD 1 At3g25620 AtWBC21 AP005605 Ljwgs 042343.1 284 NBD 1 At5g06530 AtWBC23 AC105730 Ljwgs 065716.1 239 TMD At5g06530 AtWBC23 AC105730 chr1.cm0361.52 722 NBD-TMD 1 At5g06530 AtWBC23 AC105730 CM0584.36 646 NBD-TMD At5g19410 AtWBC24 AC108501 Ljwgs 056261.1 104 NBD At1g53390 AtWBC25 AP006616 Ljwgs 136409.1 84 NBD At1g53390 AtWBC25 AC146525 TM1729.1 274 TMD At1g53390 AtWBC25 AC146546 Ljwgs 010144.1 330 TMD At1g71960 AtWBC26 AC120532 Ljwgs 111595.1 250 NBD 1 At1g71960 AtWBC26 AC120532 Ljwgs 052208.1 205 NBD At3g13220 AtWBC27 AP005446 chr2.cm0210.1 80 NBD At3g13220 AtWBC27 AP005446

216 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13, Table 2. continued ORF name Amino acids Position ABC signature Arabidopsis Common name Rice chr2.cm0210.1.1 611 NBD-TMD 1 At3g13220 AtWBC27 AC146546 Ljwgs 019418.1 446 NBD 1 At5g60740 AtWBC29 AC146525 Ljwgs 028633.2 137 NBD At5g60740 AtWBC29 AP006616 Ljwgs 030896.1 299 NBD At5g60740 AtWBC29 AC146546 Ljwgs 036023.1 196 NBD-TMD At5g60740 AtWBC29 Ljwgs 043992.1 98 NBD 1 At5g60740 AtWBC29 AP006616 Ljwgs 050631.1 148 NBD 1 At5g60740 AtWBC29 AP006616 Ljwgs 121101.1 221 NBD At5g60740 AtWBC29 AP006616 Ljwgs 125967.1 149 NBD At5g60740 AtWBC29 AP006616 chr3.tm0649.4 738 NBD-TMD 1 At5g60740 AtWBC29 AP006616 chr3.tm0649.4.1 239 NBD At5g60740 AtWBC29 AC146546 PDR, full-size proteins CM0680.21 78 At3g16340 AtPDR1 AJ535043 Ljwgs 040290.1 300 NBD2-TMD2 1 At3g16340 AtPDR1 AJ535043 Ljwgs 042836.1 135 NBD2 At3g16340 AtPDR1 AJ535045 Ljwgs 068083.1.1 98 NBD1 At3g16340 AtPDR1 AJ535043 chr3.cm0396.9.1 70 At3g16340 AtPDR1 AP005128 CM0680.11.2 87 TMD2 At4g15220þAt4g15230 AtPDR2 AJ535053 Ljwgs 020257.1 521 NBD2-TMD2 1 At2g29940 AtPDR3 AC099325 Ljwgs 058504.1 247 NBD1 At2g29940 AtPDR3 AC099325 Ljwgs 075895.1 210 At2g29940 AtPDR3 AC099325 Ljwgs 094283.1 223 NBD2 1 At2g29940 AtPDR3 AC099325 Ljwgs 099512.1 158 NBD1 At2g29940 AtPDR3 AJ535041 Ljwgs 106607.1 145 TMD2 At2g29940 AtPDR3 AC099325 Ljwgs 120609.1 115 TMD1 At2g29940 AtPDR3 AC099325 chr2.cm0346.26 1441 NBD1-TMD1-NBD2-TMD2 2 At2g29940 AtPDR3 AJ535042 Ljwgs 054858.1 160 NBD1 At2g26910 AtPDR4 AJ535049 Ljwgs 091889.1 199 NBD2 At2g26910 AtPDR4 AJ535049 CM0680.11.1 259 NBD1 1 At2g37280 AtPDR5 AJ535053 CM0680.11.3 144 At2g37280 AtPDR5 AJ535053 CM0680.2 161 At2g37280 AtPDR5 AJ535044 Ljwgs 147348.1 79 NBD1 At2g37280 AtPDR5 AJ535044 Ljwgs 010594.1 651 NBD1-TMD1-NBD2-TMD2 At2g36380 AtPDR6 AJ535052 Ljwgs 011812.2 362 NBD1 At2g36380 AtPDR6 AJ535047 Ljwgs 015205.2 209 NBD1 At2g36380 AtPDR6 AJ535050 Ljwgs 049495.1 191 At2g36380 AtPDR6 AJ535048 Ljwgs 054730.1 156 NBD2 At2g36380 AtPDR6 AP003827 Ljwgs 072776.1 132 NBD1 At2g36380 AtPDR6 AJ535047 Ljwgs 082511.1 229 TMD2 At2g36380 AtPDR6 AJ535046 Ljwgs 088220.1 142 NBD1 At2g36380 AtPDR6 AJ535044 chr1.bm1697.11 139 TMD1 At2g36380 AtPDR6 AJ535052 Ljwgs 018305.1 102 NBD1 At1g15210 AtPDR7 AJ535043 Ljwgs 042126.1 131 NBD1 At1g15210 AtPDR7 AJ535043 Ljwgs 048629.1 128 NBD1 At1g59870 AtPDR8 AP003613 Ljwgs 070603.1 156 TMD2 At1g59870 AtPDR8 AJ535043 Ljwgs 148656.1.1 134 TMD2 At1g59870 AtPDR8 AJ535041

No. 5] A. Sugiyama et al. 217 Table 2. continued ORF name Amino acids Position ABC signature Arabidopsis Common name Rice chr1.cm0171.64.2 62 TMD2 At1g59870 AtPDR8 AJ535043 CM0680.11.4 237 NBD2 At3g53480 AtPDR9 AJ535053 CM0680.20 115 TMD2 At3g53480 AtPDR9 AJ535053 CM0680.20.1 528 NBD2-TMD2 1 At3g53480 AtPDR9 AJ535053 CM0680.8.1 93 TMD2 At3g53480 AtPDR9 AJ535041 CM0680.9 294 NBD1 1 At3g53480 AtPDR9 AJ535053 Ljwgs 013528.1 163 NBD1 At1g15520 AtPDR12 AJ535048 Ljwgs 014331.1 327 NBD1 At1g15520 AtPDR12 AJ535048 Ljwgs 015328.0.1 63 NBD2 At1g15520 AtPDR12 AJ535048 Ljwgs 020627.1 461 NBD1-TMD1 1 At1g15520 AtPDR12 AJ535048 Ljwgs 030985.1 286 NBD1 At1g15520 AtPDR12 AJ535044 Ljwgs 035411.2 224 TMD2 At1g15520 AtPDR12 AP005724 Ljwgs 036170.1 377 NBD2-TMD2 1 At1g15520 AtPDR12 AP002844 Ljwgs 040203.1 315 TMD1-NBD2 At1g15520 AtPDR12 AJ535048 Ljwgs 060412.1 247 TMD1 At1g15520 AtPDR12 AJ535048 Ljwgs 060957.1 311 NBD1 1 At1g15520 AtPDR12 AJ535048 Ljwgs 068056.1.1 81 TMD1 At1g15520 AtPDR12 AJ535048 Ljwgs 075704.1 176 NBD1 At1g15520 AtPDR12 AP002844 Ljwgs 077747.1 152 NBD1 1 At1g15520 AtPDR12 AJ535044 Ljwgs 080010.1 258 NBD2-TMD2 1 At1g15520 AtPDR12 AJ535046 Ljwgs 085739.1 202 NBD2 1 At1g15520 AtPDR12 AJ535045 Ljwgs 086126.1 193 TMD2 At1g15520 AtPDR12 AJ535046 Ljwgs 097826.1 98 NBD2-TMD2 At1g15520 AtPDR12 AJ535048 Ljwgs 100684.1 78 NBD2 At1g15520 AtPDR12 AJ535048 Ljwgs 114409.1 131 TMD2 At1g15520 AtPDR12 AJ535045 Ljwgs 128169.1.1 57 TMD2 At1g15520 AtPDR12 AJ535048 Ljwgs 141500.1 135 NBD1 At1g15520 AtPDR12 AJ535048 Ljwgs 146574.1 111 NBD1 At1g15520 AtPDR12 AJ535048 TM0485.4.1 70 TMD1 At1g15520 AtPDR12 AJ535048 chr3.cm0226.70 334 TMD1-NBD2 1 At1g15520 AtPDR12 AJ535048 chr3.cm0226.70.1 647 NBD1-TMD1 1 At1g15520 AtPDR12 AJ535048 chr3.cm0226.73 354 TMD2 At1g15520 AtPDR12 AJ535048 chr3.cm0226.74 1432 NBD1-TMD1-NBD2-TMD2 2 At1g15520 AtPDR12 AC099325 chr5.cm0052.11.1 126 NBD2 At1g15520 AtPDR12 AJ535048 Ljwgs 027501.1 336 NBD2 1 At1g66950 AtPDR13 AP002844 Ljwgs 042036.1 290 TMD2 At1g66950 AtPDR13 AP002844 Ljwgs 044845.1 299 NBD2 1 At1g66950 AtPDR13 AJ535052 SMC, half-size proteins Ljwgs 032423.1 187 At3g54670 AtSMC1 AJ535209 Ljwgs 054921.1 295 At3g54670 AtSMC1 AJ535209 Ljwgs 066788.1 129 At3g54670 AtSMC1 AJ535209 Ljwgs 114651.1 335 At3g54670 AtSMC1 AJ535209 Ljwgs 128457.1 305 At3g54670 AtSMC1 AJ535209 Ljwgs 138190.1 120 At3g54670 AtSMC1 AJ535209 TM1755.25 329 At3g54670 AtSMC1 AJ535209 TM1755.25.1 845 At3g54670 AtSMC1 AJ535209

218 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13, Table 2. continued ORF name Amino acids Position ABC signature Arabidopsis Common name Rice chr2.cm0545.2 331 At3g54670 AtSMC1 AP004997 Ljwgs 017323.2 131 At3g47460 AtSMC2 AJ535210 Ljwgs 019256.1 756 At3g47460 AtSMC2 chr2.cm0545.2.3 215 At3g47460 AtSMC2 AJ535211 chr3.cm0460.15 1237 At3g47460 AtSMC2 AL662950 chr3.tm1419.19 577 At3g47460 AtSMC2 TM1742.7 1304 1 At5g48600 AtSMC3 AC108523 Ljwgs 035120.1 224 At5g62410 AtSMC4 AP006757 Ljwgs 075084.1.1 70 At5g62410 AtSMC4 AJ535210 Ljwgs 075572.1 63 At5g62410 AtSMC4 AJ535210 Ljwgs 097535.1 181 At5g62410 AtSMC4 Ljwgs 098168.1 256 At5g62410 AtSMC4 Ljwgs 118611.1 78 At5g62410 AtSMC4 AJ535210 chr5.cm0956.6.1 514 At5g62410 AtSMC4 chr6.cm0057.48 837 At5g62410 AtSMC4 AP002069 chr2.cm0545.2.2 284 At5g62410 AtSMC4 AJ535211 NAP, half-size proteins Ljwgs 057804.1 71 At4g04770 AtNAP1 AP003734 Ljwgs 082468.1 382 At4g04770 AtNAP1 AP003734 Ljwgs 144231.2 204 1 At1g67940 AtNAP3 TM0445.38.1 256 1 At1g67940 AtNAP3 AP003771 Ljwgs 133056.1 39 At1g03900 AtNAP4 AP002866 TM1442.8 529 At1g03900 AtNAP4 AC078829 Ljwgs 010991.3 100 At1g32500 AtNAP6 AP002860 Ljwgs 056225.1 297 At1g32500 AtNAP6 AP002860 chr4.cm0004.47 273 1 At3g10670 AtNAP7 Ljwgs 042529.1 104 At4g25450 AtNAP8 AP005188 Ljwgs 063180.1 90 At5g02270 AtNAP9 Ljwgs 064613.1 117 At5g02270 AtNAP9 AP004081 Ljwgs 012015.3 230 1 At1g63270 AtNAP10 AL606669 TM1678.4 195 1 At1g63270 AtNAP10 AL606669 TM1746.11.1 191 At1g63270 AtNAP10 AL606669 Ljwgs 040451.2 236 At1g65410 AtNAP11 chr1.cm1413.34 444 1 At1g65410 AtNAP11 AC133930 Ljwgs 092665.1 236 1 At2g37010 AtNAP12 AP006616 Ljwgs 010025.0.1 103 1 At4g33460 AtNAP13 AP003106 Ljwgs 011506.1 130 1 At4g33460 AtNAP13 AP003106 Ljwgs 022922.1 112 At4g33460 AtNAP13 AP003106 Ljwgs 069249.2 71 1 At4g33460 AtNAP13 AP003106 Ljwgs 133099.1 60 At5g14100 AtNAP14 ORFs of Lotus japonicus were classified into 10 subfamilies according to the nomenclatures of human ABC proteins and Sanchez- Fernandez et al. 3 AtABCA1, eight of which showed high amino acid similarity to AtABCA1 (Table 2). Ljwgs 011569.2, Ljwgs 024406.1, Ljwgs 026106.1.1, Ljwgs 031377.1, Ljwgs 039697.1, Ljwgs 047252.1, Ljwgs 058746.1 and Ljwgs 058851.1 show similarity to the following regions: 742 1037, 1 115, 1034 1105, 520 578, 412 518, 1159 1253, 689 740 and 1268 1329 amino acid position of AtABCA1, respectively (see Supplementary Figure 1).

No. 5] A. Sugiyama et al. 219 Table 3. Comparison of the ABC protein superfamily in plants and humans Genome (Mb) ABCA ABCA 0 ABCB ABCC ABCD ABCE ABCF ABCG PDR SMC NAP Total Reference Lotus 450 1 2 15 (3) 17 2(1) 1 6 24 12 1 10 91 This work Arabidopsis 125 1 16 27(5) 15 2(1) 2 5 29 15 4 15 131 (3,15) Rice 440 0 7 28(4) 17 3(1) 2 5 30 21 4 10 125 (5), this work Human 3000 12 0 11(7) 13 4(4) 1 3 5 0 0 0 49 (16) Parentheses indicate the number of half-size ABC proteins for the subfamilies, in which both full- and half-size members are classified as one group. As each of those fragments showed similarity to different parts of AtABCA1 and no overlapped fragment having different DNA sequence was seen, it can be concluded that these eight fragments are derived from one ABCA protein of L. japonicus. Arabidopsis and rice contain 16 and 7 members of half-size ABCA 0 proteins, respectively, 3,5 whereas in L. japonicus we found 34 fragments that show striking amino acid similarity to ABCA 0 proteins of both model plants (Table 2). Among them two fragments have an ABC signature, indicating that at least two members of ABCA 0 protein (TM1130.25 and TM1130.29) exist in the L. japonicus genome (Table 3). Some members of this subfamily have been studied intensively in humans due to their clinical importance. ABCA1, ABCA2 and ABCA4 have been reported to be involved in cholesterol transport, drug resistance and Rod photoreceptor retinoid transport, respectively. 19 In plants, however, the physiological functions of these subfamily members are not clear, except for AtABCA1, which is preferentially expressed in the pollen of Arabidopsis (K. Yazaki; unpublished data) and seems to regulate pollen germination (personal communication from Dr C. Forestier of Cadarache Institute). Since all plant AtABCA1 orthologs found in EST databases are limited in dicots, e.g. soybean (Glycine max), common bean (Phaseolus vulgaris), potato (Solanum tuberosum) and tomato (Lycopersicon esculentum), while only half-size members are observed in monocots, ABCA proteins may have a function specific to dicots, or ABCA 0 members may complement the function of ABCA member. 3.4. ABCB subfamily The Arabidopsis ABCB subfamily consists of 22 full-size members, which are conventionally named MDR or PGP, five half-size members that comprise 2 TAPs and 3 ATMs, 3 whereas 24 full-size (MDR) and 4 half-size (3 TAPs and 1 ATM) proteins are present in the rice genome. 5 In Lotus, we found 60 fragments that showed similarity to Arabidopsis full-size proteins, 23 of which had at least one ABC signature (Table 2). Since NBD1 and NBD2 of the ABCB subfamily of Arabisopsis are not obviously distinguishable as shown in Fig. 1, domain-based clustering analysis with NBDs of Lotus and Arabidopsis did not show clear separation between NBD1 and NBD2 (Fig. 2A). We then employed TMDs for domain-based clustering analysis to clarify the phylogenetic relationship of this subfamily in Lotus as a more reliable method for this subfamily. A phylogenetic tree was constructed with 22 TMDs obtained from 18 fragments of Lotus and TMDs of all Arabidopsis fullsize ABCB proteins (Fig. 2B). From this result it is predicted that at least 12 full-size proteins of the ABCB subfamily are present in the Lotus genome, because 9 fragments contain TMD1, 7 fragments contain TMD2 and 3 fragments contain both TMD1 and TMD2. We also found four fragments showing striking similarity to Arabidopsis TAP and one fragment of ATM half-size protein (Table 2). Two of the TAP-like fragments have an ABC signature, suggesting that two TAP proteins are in Lotus as in other plant species analyzed. In total, the number of Lotus ABCB proteins is estimated as 15, which implies 12 full-size MDR-type, 2 TAP-like members and 1 ATM-like protein (Table 3). In humans this subfamily is comprised of 11 members, including 7 half-size proteins. Most intensive studies of this subfamily have been done with ABCB1, also called MDR1, whose gene product was designated to be PGP. 20 This was the first eukaryotic ABC protein identified as the drug efflux pump responsible for multi-drug resistance in cancer cells. 21 Only one member of this subfamily in Arabidopsis, AtMDR1, also known as AtPGP1, was reported to confer herbicide tolerance when over-expressed in plants. 22 Our trials to overexpress other MDR-type ABC transporters derived from various plants including Arabidopsis failed to confer a multi-drug resistance phenotype, suggesting that the name MDR is actually inappropriate as the general name of full-sized ABCB subfamily members. It is noteworthy that this subfamily has attracted the particular attention of plant hormone researchers, because some full-size members of this subfamily have been involved in polar auxin transport in plants. Geislar et al. 23 showed the AtPGP1-mediated efflux of auxin in yeast and HeLa cells, and counterparts of this ABC protein in maize and sorghum were reportedly also responsible for auxin transport in monocots. 24 Another member, AtPGP4, was recently reported as an auxin

220 An Inventory of ABC Proteins in Lotus japonicus [Vol. 13,

No. 5] A. Sugiyama et al. 221 transporter regulating basipetal transport in the root. 25,26 Although not all full-size ABCB proteins may be involved in auxin transport, the ABCB subfamily probably plays a central role in auxin transport together with the PIN family, also in Lotus. Plant full-size ABCB proteins contain members functioning as inward transporters, contrary to other eukaryotic ABC proteins that mediate the efflux of substrate from cytosol. 27 The Lotus ABCB subfamily will provide other members for study of the mechanism to determine transport direction. Half-size proteins in humans, ABCB2 and ABCB3, are also called TAP1 and TAP2, respectively, which form a heterodimer to transport peptides into the ER lumen for peptide processing, resulting in presentation on the cell surface by class I major histocompatibility complex molecules. 28 A barley half-size TAP-like protein, IDI7, was identified as an iron-deficiency-induced gene, 29 but its physiological functions, including the transport substrate, remain to be clarified. Another half-size protein, ABCB7 of humans, also known as ABC7, is an ortholog of yeast ATM1 and has been reported to be a mitochondrial protein that functions in the biogenesis of iron sulfur proteins. 30 Arabidopsis ATM AtATM3, also called STA1, was identified from the starik mutant, which showed dwarfism and chlorosis, and was reported to function in the biogenesis of iron sulfur clusters. 31 Recently, AtATM3 was reported to be involved in heavy metal resistance. 32 3.5. ABCC subfamily The ABCC subfamily consists of full-size ABC proteins conventionally designated as MRP, which often have N-terminal extension of the TMD. There are 15 and 17 members in the ABCC subfamily in Arabidopsis and rice, respectively, 3,5 including a pseudogene AtMRP15 in Arabidopsis. 17 In Lotus, we found 71 fragments with strong similarity to Arabidopsis MRP proteins (Table 2). Among them, 23 fragments have at least one ABC signature. Domain-based clustering analysis was performed with the putative NBD regions of those fragments and those of Arabidopsis ABCC proteins (Fig. 2C). Analysis showed that 7 fragments contain NBD1, 15 fragments contain NBD2 and 2 fragments contain both NBD1 and NBD2. Taken together, it is estimated that 17 ABCC proteins exist in Lotus (Table 3). Domain-based clustering analysis with TMDs also gave a similar result (data not shown). As is the case for human ABCC proteins like ABCC1, ABCC2 and ABCC3, 16 some plant ABCC proteins also recognize glutathione conjugates as their transport substrate. For instance, AtMRP1 shows substrate specificity for a wide variety of glutathione (GSH) conjugates of cadmium, dinitrophenol and metolachlor, as well as oxidized GSH, while both AtMRP2 and AtMRP3 transport chlorophyll catabolite adding to several GSH conjugates. 33 35 In addition, an ABCC protein of maize has been reported to be involved in the transport of anthocyanin, probably in a GSH-dependent manner. 36 Human ABCC subfamily members also function as ion channel and/or channel regulators, e.g. ABCC7, also known as cystic fibrosis transmembrane conductance regulator (CFTR), is a chloride ion channel, 37 whereas ABCC8 and ABCC9, conventionally named sulfonylurea receptor 1 (SUR1) and SUR2, respectively, act as regulatory subunits of the potassium channel regulating insulin secretion 38 in an ATP-sensitive manner. Despite detailed analysis, plants do not seem to have bona fide counterparts of CFTR and SUR, whilst AtMRP4 and AtMRP5 are involved in stomata movement, 39 41 where ion channels play a central role, and the latter could bind sulfonylurea, 42 suggesting that these are, at least, functional counterparts of SUR proteins in Arabidopsis. 3.6. ABCD subfamily ABC proteins localized at the peroxisome have characteristic sequences highly conserved only among this subfamily, which is classified as the ABCD subfamily. In humans, this subfamily consists of four members, which are all half-size ABC proteins. In contrast, Arabidopsis and rice ABCD contain one and two full-size ABCD proteins, respectively, in addition to one half-size protein for each plant. 3,5 In Lotus, four and three fragments have similarity to half-size and full-size ABCD proteins, respectively (Table 2). Although ABC signatures are not observed in those fragments, it is presumed that the number of half-size and full-size ABCD protein in the Lotus genome sequence is at least one each (Table 3). Three independent research groups reported full-size Arabidopsis ABCD protein nearly at the same time, Figure 2. Domain-based clustering analysis. (A) The amino acid sequences of putative NBDs of Lotus fragments similar to ABCB proteins and Arabidopsis ABCB subfamily members were aligned using the ClustalW program. (B) The amino acid sequences of putative TMDs of Lotus fragments similar to ABCB proteins and Arabidopsis ABCB subfamily members were aligned using the ClustalW program. (C) The amino-acid sequences of NBDs of Lotus fragments showing highest similarity to ABCC proteins and those of Arabidopsis NBDs of ABCC members were aligned using the ClustalW program. (D) The amino acid sequences of NBDs of Lotus fragments similar to ABCF proteins with at least one ABC signature and those of Arabidopsis NBDs of ABCF members were aligned using the ClustalW program. (E) The amino acid sequences of NBDs of Lotus fragments similar to PDR proteins with at least one ABC signature and those of Arabidopsis NBDs of PDR members were aligned using the ClustalW program. N1 and N2 represent NBD1 and NBD2, respectively, and T1 and T2 represent TMD1 and TMD2, respectively. Each number before the hyphen corresponds to the nomenclature of the ABCB member of Arabidopsis according to Sanchez- Fernandez et al. 3 Ljwgs is referred to as sequences assembled from the whole genome shotgun, and TM, CM are referred to as sequences from TAC and BAC clones. Lotus genes are shown in red.