microrna Dr. Researcherr Prepared by LC Sciences, LLC Jan. 1, 2009

Size: px
Start display at page:

Download "microrna Dr. Researcherr Prepared by LC Sciences, LLC Jan. 1, 2009"

Transcription

1 Sequencing Dataa Report microrna Discovery Sequencing Service On sample_ For Dr. Researcherr Life Sciences University of USA Prepared by LC Sciences, LLC Jan. 1, 2009 Prepared by LC Sciences, LLC W. Bellfort, Suite 270, Houston, Texas Tel , Fax

2 microrna Discovery Sequencing Data Report sample_ I. PROJECT INFORMATION Project related information is listed in Table 1. Table 1. Sample, service, and project tracking information Project Information Customer Sample Name: Sample Type: Date Sample Received: Service Requested: Data Analysis Requested: LCS Project Number: LCS Sample ID: sample_ Human total RNA 12/15/2009 microrna Discovery Sequencing Service Standard Data Analysis sample1 II. DA ATA REPORT The received RNA sample was processed to generate a cdna library which was then used to deep sequencing. The dataa generated were analyzed and the full data files of 2-3 Gb were saved onto a DVD disc which is included in this report. Experimental procedures and analysis methods were described in Section III of this report. The statistics of the data analysis was given in file Data_summary_sample1.xls and a summary is presented in Table 2. The detailed dataa files which may be in tens of Mbs and the recommended software programs for reviewing the data are given in Table 3. Terminologies Used Sequ Seq: Raw sequencing reads generated in after image extraction and base-calling Unique Seq: Family of sequ seq with same sequence Copy Number: Number of sequ seqs in the same unique seq family Count: Number of sequ seqs in the same unique seq family Mapping: Aligning a sequence to a reference database Mir: pre-mirna registered in mirbase mir: mature mirnas registered in mirbase LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

3 microrna Discovery Sequencing Data Report sample_ Table 2. A summary of standard data analysis results Raw Mappable Mapped to mirbase (including nohit 1) Mapped to Cluster I Mapped to Cluster II Mapped to Cluster III Mapped to Cluster IV Mapped (total) Nohit (including nohit 1 and nohit 2) #SequSeq 6,870,, % 4,764,, % 4,200,, % 4,181,, % % 2, % 13, % 4,196,, % 567, % %SequSeq #UniqueSeq 1,445,905 40,890 8,971 7, ,128 9,201 31,689 %UniqueSeq % 2.83% 0.62% 0.55% 0.00% 0.01% 0.08% 0.64% 2.19% Flow chart of sequencing data analysis of a single sequencing reaction through various filters and the number of mirnas detected. 7.42% mrn NA, RFam,repbas se filter 1.35% ADT filter 0.05% Junk filter 0.07% Sequ uence pattern filt ter 6.07% length<15 or > % copy#< <3 91% mappable 88% pas ss optional filte r 9,031,007 reads (91%) are mappable 7,944,551 re eads (88%) passed opt ional filter 41% unma apped 38% (gro our 4) 292 mirnas detected 39 9% (group 1) 300 mirna detecte d 4,722,478 re eads (59%) are mapped to or are mirna ca andidates 19% (gro oup 3) 149 mirna As detected 4% (g group 2) 27 mi RNAs detected d LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

4 microrna Discovery Sequencing Data Report sample_ Folder Raw Data Filtered Data Table 3. Data files delivered and programs recommended for reviewing Data Files sample1_rawdata.txt sample1_fhg unique.txt sample1_fhg pass.txt sample1_fhg long.txt sample1_fhg short.txt sample1_fhg hc.txt sample1_fhg lc.txt Description Sequencing sequences (sequ seqs) ) as Wordpad obtained from sequencer Sequ seqs listed by family (unique seqs) Wordpad Unique seqs passed digital filters Unique seqs with length >= 15 Unique seqs with length < 15 Reviewing Program Wordpad Wordpad Wordpad Unique seqs with copy number >=3 Wordpad Unique seqs with copy number <3 Wordpad sample1_fhg db.fa Final mappable unique seqs Wordpad sample1_fhg gp1_align.txt sample1_fhg gp1_mirlist.txt sample1_fhg gp1_sum.txt Cluster I: see Table 4 Wordpad Excel Excel sample1_fhg gp2_align.txt sample1_fhg gp2_mirlist.txt sample1_fhg gp2_sum.txt Cluster II: see Table 4 Wordpad Excel Excel Mapped Data sample1_fhg gp3_align.txt sample1_fhg gp3_mirlist.txt sample1_fhg gp3_sum.txt Cluster III: see Table 4 Wordpad Excel Excel sample1_fhg gp4_align.txt sample1_fhg gp4_mirlist.txt sample1_fhg gp4_sum.txt Cluster IV: see Table 4 Wordpad Excel Excel sample1_fhg uni_mirs.txt The list of all unique seqs from Cluster I to IV Wordpad sample1_fhg nohit.txt Unique seqs having no hit with reference libraries or the genome Wordpad Summary sample1_fhg clusterposition.txt sample1_fhg mirdistribution.png Data_summary_sample1.xls Genomic chromosomal positions of the mapped unique seqs Plot of position of mapped unique seqs inside genome Statistics of data analysis at the various steps and the final results Excel Paint Excel LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

5 microrna Discovery Sequencing Data Report sample_ III. M ETHODS AND EXPERIMENTS A. Small RNA Library Constructionn A small RNA library was generated from the customer sample according to Illumina s sample preparation instruction 1. A summary of the procedures performed is briefly described below. 1. Small RNA Isolation by Denaturing PAGE Gel The received total RNA sample was size-fractionated on a 15% tris-borate-edta- quantified following gel elution, and ethanol precipitated. Urea polyacrylamide gel. The RNA fragments of length nts were isolated, 2. 5 and 3 Adapter Ligation The SRA 5 adapter (Illumina) was ligated to the aforementioned RNA fragments with T4 RNA ligase (Promega). The ligated RNAs were size-fractionated on a 15% trisborate-edta-urea polyacrylamide gel and the RNA fragments of size ~41-76 nts were isolated. The SRA 3 adapter (Illumina) ligation was then performed, followed by a second size-fractionation using the same gel condition as described above. The RNA fragments of size ~64-99 nts were isolated through gel elution and ethanol precipitation. 3. Reverse Transcription and PCR Amplification The ligated RNA fragments were reverse transcribed to single-stranded cdnas using M-MLV (Invitrogen) with RT-primers recommended by Illumina. The cdnas were amplified with pfx DNA polymerase (Invitrogen) in 20 cycles of PCR using Illumina s small RNA primers set. 4. Purification of Amplified cdna Library for Sequencing PCR products prepared were purified on a 12% TBE polyacrylamide gel and a slice of gel of ~ bps was excised. This fraction was eluted and the recovered cdnas were precipitated and quantified on Nanodrop (Thermo Scientific) and on TBS-380 mini-fluorometer (Turner Biosystems) using Picogreen dsdna quantization reagent (Invitrogen). The concentration of the sample was adjusted to ~10 nm and a total of 10 L was used in sequencing reaction. B. Deep Sequencing The purified cdna library was used for cluster generationn on Illumina s Cluster Station and then sequenced on Illumina GAIIx following vendor s instructionn for running the instrument. Raw sequencing reads were obtained using Illumina s Pipeline v1.5 software LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

6 microrna Discovery Sequencing Data Report sample_ following sequencing image analysis by Pipeline Firecrest Module and base-calling by Pipeline Bustard Module. The extracted sequencing reads weree stored in file sample1_rawdata.txt and were then used in the standard data analysis, which is described in the next Section. C. Standard Data Analysis A proprietary software package, ACGT101-miR v3.x (LC Sciences), was used for standard data analysis. The key functions performed by this software and the relevant analysis results are described here. 5. Obtaining Mappable Sequences from Raw Sequencing Data After the raw sequence reads, or sequenced sequences (sequ seqs) were extracted from image data, a series of digital filters (LC Sciences) were employed to remove various un-mappable sequencing reads. A Fasta file named sample1_fhg_db.fa was generated and used for mapping. a. Generating Unique Families of Sequ Seqs by Sorting Raw Sequencing Reads In this step, the same sequ seqs in the raw data file were being counted and a unique family of sequences (unique seqs) file, sample1_fhg_unique.txt, was generated. An example of a typical entry of this file is as shown below: 23 TTTGTCGG GTCTTTGGATATGCCGTGTGACAATGGTGG 1,8560 where 23 is the index of this sequence, followed by the sequ seq, and is the count (copy number) of the sequ seq. b. Generating Mappable Sequ Seqs In this step, the impurity sequences due to sample preparation, sequencing chemistry and processes, and the optical digital resolution of the sequencer detector were removed to give sequ seqs which were used to map with the reference database files. Those remaining sequ seqs were grouped by families (unique seqs) and stored in file sample1_fhg_ pass.txt. c. Filtering Unique Seqs by Length In this step, unique seqs weree separated into two groups based on their sequence lengths. Unique seqs with sequence length greater than a cut-off length (default = 15 nts for microrna discovery) were saved in the file named sample1_fhg_long.txt, while those of shorter length were saved in the file named sample1_fhg_short.txt. LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

7 microrna Discovery Sequencing Data Report sample_ d. Filtering Unique Seqs by Copy Number In this step, unique seqs weree further sorted based on their copy numbers. Those with copy numbers greater than a predefined cut-off number (default = 3) were stored in the file named sample1_fhg hc.txt while those with less copies were stored in the file named sample1_fhg lc.txt, whereas hc means high copy and lc means low copy. e. Removing Unique Seqs from Certain Known RNA Reference Databases Standard procedures were employed to remove those unique seqs which were mapped to mrna, RFam and Repbase. The unique seqs which passed the filter at this step were saved in the file called sample1_fhg_db.fa. 6. Mapping Mappable Unique Seqs to Mirs and Genome In this Section, various mappings were performed on unique seqs against pre-mirna (mir) and mature mirna (mir) sequences listed in the latest release of mirbase 2, 3, 4, or genomee based on the public releases of appropriate species. Mappings were also done on mirs of interest against genome sequence. Methods and criteria used for various mappings were documented in the ACGT-101 User s Manual 5. Brief descriptions of the analyses are presented below and the characteristics of various groups of unique seqs are summarized in Table 4. a. Mapping Unique Seqs to Mirs in mirbase The cleaned unique seqs in sample1_fhg db.fa were blasted against mirs in mirbase. The mapped unique seqs weree grouped as unique seqs mapped to mirs in mirbase, while the remaining ones were grouped as unique seqs un-mapped to mirs in mirbase. b. Mapping Mirs Mapped by Unique Seqs to Genome The mirs to which the unique seqs in unique seqs mapped to mirs in mirbase group weree mapped were further blasted against genome. The mirs mapped to genome weree sorted out and the unique seqs associated with these mirs were grouped as unique seqs mapped to mirs that further mapped to genome. This group of unique seqs were categorized as Cluster I and saved in file sample1_fhg_gp1_mirlist.txt. Their alignments were presented in file sample1_fhg_gp1_align.txt. A summary file was also generated and saved as sample1_fhg_gp1_sum.txt. The unique seqs mapped to the mirs that were not mapped to genome were grouped as unique seqs mapped to mirs that un-mapped to genome. LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

8 microrna Discovery Sequencing Data Report sample_ c. Mapping Unique Seqs with Mirs Un-mapped to Genome to Genome The unique seqs in the group unique seqs mapped to mirs that un-mapped to genome were blasted against genome. The unique seqs that mapped to genome heree were grouped as unique seqs mapped to mirs and genome but mirs un-mapped to genome and were categorized as Cluster II and saved in file sample1_fhg_gp2_mirlist.txt. Their alignments were presented in file sample1_fhg_gp2_align.txt. A summary file was also generated and saved as sample1_fhg_gp2_sum.txt. The remaining unique seqs were grouped as unique seqs mapped to mirs but neither unique seqs nor their mirs mapped to genome. d. Mapping Unique Seqs to mirs The unique seqs in the group unique seqs mapped to mirs but neither unique seqs nor their mirs mapped to genome were further categorized based on whether unique seqs were mapped to any mature mirnas (mirs) in the mirs to which the unique seqs were mapped. All unique seqs in this group that were mapped to mirs were grouped as unique seqs mapped to mirs and mirs but neither unique seqs nor their mirs mapped to genome and were categorized as Cluster III and saved in file sample1_fhg_gp3_mirlist.txt. Their alignments were presented in file sample1_fhg_gp3_align.txt. A summary file was also generated and saved as sample1_fhg_gp3_sum.txt. The rest of unique seqs in the group that were un-mapped to mirs were further grouped as unique seqs mapped to mirs but not to mir and neither unique seqs nor their mirs mapped to genome and termed as unique seqs nohit 1. e. Mapping Unique Seqs Un-mapped to Mirs to Genome Unique seqs in the group of unique seqs un-mapped to mirs in mirbase were blasted against genome directly and those mapped to genome weree identified. The extended sequences of the mapped genome sequences were tested for possible formation of stable hairpins. When stable hairpins were predicted, their associated unique seqs were then grouped as unique seqs un-mapped to mirs but mapped to genome with possible hairpin formation. These unique seqs were categorized as Cluster IV and saved in file sample1_fhg_gp4_mirlist.txt. Their alignments were presented in file sample1_fhg_gp4_align.txt. A summary file was also generated and saved as sample1_fhg_gp4_sum.txt All unique seqs in Cluster I to IV were listed in sample1_fhg_uni_mirs.txt as mapped mirs or predicted mirs. LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

9 microrna Discovery Sequencing Data Report sample_ The unique seqs that were mapped neither to mirs in mirbase nor to genome were grouped as unique seqs un-mapped to mirs and genome and weree termed as unique seqs nohit 2. The unique seqs in both groups of unique seqs nohit 1 and unique seqs nohit 2 weree combined as unique seqs nohit and saved in file sample1_fhg_nohit.txt. f. Plot of the Chromosome Genomic Positions of the Mapped Unique Seqs The genomic positions of the Cluster I to IV sequences were mapped to chromosomes and the results were saved in the file named sample1_fhg_clustposition.txt and displayed in the plot file named sample1_fhg_mirdistribution.png. Table 4. Summary of mapping of unique seqs to mirs, mirs, and genome* Clusters Group Description mir Unique seqs Mapped* Genome mir Comments Cluster I Unique seqs mapped to mirs that further mapped to genome Cluster II Cluster IIII Unique seqs mapped to mirs and genome but mirs un-mapped to genome Unique seqs mapped to mirs and mirs but neither unique seqs nor their mirs mapped to genome mirs un-mapped to genome mirs un-mapped to genome Cluster IV Unique seqs un-mapped to mirs but mapped to genome with possible hairpin formation Unique seqs nohit Unique seqs mapped to mirs but not to mir and neither unique seqs nor their mirs mapped to genome (unique seqs nohit 1) Unique seqs un-mapped to mirs and genome (unique seqs nohit 2) mirs un-mapped to genome * Note: indicates that a unique seq was mapped to the mir, mir, or genome. indicates that a unique seq was not mapped to the mir, mir, or genome. LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

10 microrna Discovery Sequencing Data Report sample_ Length Distribution of Mappable Dataa 4,000,000 3,500,000 3,533,567 3,000,000 2,500,000 2,,203,539 2,000,000 1,500,000 1,000, , , , ,314 81,131 77, , , ,,713 42,930 20, Length (nt) Total # of Reads 105,314 81,131 77, , , ,824 2,203,539 3,533, , ,713 42,930 20,549 7,944,551 % of Total Reads # of Unique Seqs , 471 Reads # / Unique seqs LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

11 microrna Discovery Sequencing Data Report sample_ Chromosomal location of pre-mirnas. The relative locations of individual pre-mirnas (mir) are shown across the 19 chromosomes. MID (Maximum Inter-Distance) is the maximum distance between any two pre-mirnas on a same chromosome considered to be in the same cluster. Fiftynine clusters (black dots) are obtained under the MID is limited to 50 kb. LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

12 microrna Discovery Sequencing Data Report sample_ IV. RE FERENCE Preparing Samples for Analysis of Small RNA, Illumina Inc., Part # Rev. A, 2008; Griffiths-Jones, S., Saini, H.K., van Dongen, S., Enright, A.J., mirbase: tools for microrna genomics, 2008, Nucleic Acids Research, 36, D154-D158; Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., Enright, A.J., mirbase: microrna sequences, targets and gene nomenclature, 2006, Nucleic Acids Research, 34, D140-D144; Griffiths-Jones, S., The microrna Registry, 2004, Nucleic Acids Research 32, D109-D111; LC Sciences ACGT 101 manual; LC Sciences, LLC support@lcsciences.com 2575 W. Bellfort, Suite 270, Houston, Texas Tel , Fax

13 Example printouts of the included sequence data files are attached. The data files included with each sample report are listed in Table 3. These printouts represent truncated sample data files.

14 sample1_rawdata >ILLUMINA-57021F:7:1:3:1127#0/1 ACATTGGGTTCTCATTCAAATATACTTTTGAAGTATGTGC >ILLUMINA-57021F:7:1:3:352#0/1 ACGAAGAGGGAGCGCAATNNNTCAGTATATATTGAAGGAC >ILLUMINA-57021F:7:1:3:804#0/1 TCTGAGCAGTGACTAGNACCCGTAATANGAGGTGAGCAGC >ILLUMINA-57021F:7:1:3:2003#0/1 CAACAAGGGTAAGTTAATGCAATCGCCCCTCCNNAAAGGG >ILLUMINA-57021F:7:1:3:399#0/1 TTAAGGTAATTAGCGTGGGCGGTAGCGCTCTGTATAAGCT >ILLUMINA-57021F:7:1:3:921#0/1 TAAGCAGGCCATGCCGCTACGNNGGGGATAAATCTGGCTG >ILLUMINA-57021F:7:1:3:1598#0/1 TATTGGGAGGGAATAGATCGCTGNCCAGCCTGATNTAAGG >ILLUMINA-57021F:7:1:3:1981#0/1 GCCGCCTGCCTAGGTCTTCTTATCTTGAGAATGAGTCAAG >ILLUMINA-57021F:7:1:3:118#0/1 TGAGCACACATTACATAATGCGGCTACTGTTTGACAAAGT >ILLUMINA-57021F:7:1:3:1347#0/1 CTCGATGTAAGAGATCACTATTTCGCCACATGGTATTCCG >ILLUMINA-57021F:7:1:3:185#0/1 TAAGGGACCTATTGTCAGCGGATCACAATGTCTTGAAGGA >ILLUMINA-57021F:7:1:3:824#0/1 TTTGCAAACCGATGTCAGTGGCACTGCAAATGTCCACTGT >ILLUMINA-57021F:7:1:3:1603#0/1 CAAAACAAACGTAATACGGCGGTATCCCACTAGAGTTGTC >ILLUMINA-57021F:7:1:3:613#0/1 TCCCATGAATCAGGCCNGACTAGGGAAANTTCAATCAGAC >ILLUMINA-57021F:7:1:3:1459#0/1 GCCCGCCGTTATGGACAATAAGTAAATTGCTACAATTGAC >ILLUMINA-57021F:7:1:3:494#0/1 GCTGTTCTAGAAAATGGTTTATCTATTCCTGCGTCAATCT >ILLUMINA-57021F:7:1:3:1747#0/1 TGGAAAACTCTATTAGAGTCTAACTTATCCAATGCGCACG >ILLUMINA-57021F:7:1:3:1574#0/1 AGTAAATNATANAAATAAAATTAAAAAAAAAANAAAAAAA >ILLUMINA-57021F:7:1:3:1448#0/1 CTATGTACAGCCACTCTCTTGATGGCGGGAAATATTTATT >ILLUMINA-57021F:7:1:3:149#0/1 GGTGCTGGATTCCCGTTTTGCGTATTTTGGGAGAGGTCCA >ILLUMINA-57021F:7:1:3:460#0/1 TAGGCTGTTTGCTACATTTTGAGACAAACTGTATAGAGTG >ILLUMINA-57021F:7:1:3:1436#0/1 AATGGCGGAGCGATTTATAGGGAGAGGGGCGATTGGCTCG >ILLUMINA-57021F:7:1:3:1847#0/1 GACTATCTGCCTGTAGCGGATAAGGCAGCATCCAACCTAA >ILLUMINA-57021F:7:1:3:909#0/1 sample1_rawdata

15 sample1_fhg_unique # of input sequences: # of families: ; / =21.0% Index seq count 1 ATTATGGTACTTGTATTTAACAGGCTCACT CCTCTTATGTAGACCGTTGTCCAGTGGTGA ATTTTTATAGTACCAAGAGGCTACGCAGT GTAAAAAGTCTATCGCCGCACTGTCGTCA CTTAACGGTTCTAACTATTCACCGGTAAAG CAAAGAAGCCATAGGCGCCCGGGAACACC AACCGTGAGGCGCTGGAAAGACGCTAGAAG AGTAGTACTGGCGTACACATTCTCCACGG CATCCTATTCTAGCAATCAGGAGAACATTC AGGCCCATATCAAGAAGTAGAACTATCGA ACATGAATGGCGAATGCTTCCCGTGATA GTTAGCTAGTGCCCGGTTTTATCAAGCCC CGCGAATGTCTTCGTATGCTCAGGTAGC CATGAGAGGTCGAGGGACTTGATTCCTAC CTTCTGGCACCGTGGGCCAGCGGAAGGACA TGCCCCAACGACGCGGAAAATCAAGCGAGC CTTCTGGAAGCCAACGCTCTGGCGGGATCC AAACAAATTAACTCGACGACCTCTCCTCT AAACTATGCATTATTTCCCCCTAAGATCT CATATCTGTCTCCTACCAGATTATCACCCC TGTTTGCTGTGGCATTTTTCCCATGGATTA ATTATTAACGTGGTGTGGTAAATAGAGGGT GGTTCATGCCTAAATTGCATCTATAATA GAGTCGTAACGCTACCCTATACGAAGCG CCTTGAATGTGACCCTGAGGCTTCTATTAG AAGAAGCGTCAACCCCCACGTCAAGACGT TCCGTTGCTAGCCGGAGGACCTCCTGGT CCGCAGAGCCAGGCACTATGTCAGGGGCTA CATATGGCAAAAGACCGGACTGGACGCGA ACATCAAGAATCCTCAATCCTACGTGGACG AAGAGCCGGAGAACACATGATGGAGGCGAC TGCGATAAAAACGGTGATGACCAAAGAACA TGATCACGAAAAGTTGCTTGACAAGGTT CTGGTTAAGCACCCCCTGGTGGTGCTGCCT TGGGGCTACCGGGGCCTACACGCACCCAT GTTTATACTATTAATATGCAATGGTGACT GCCCCAAGCGTAGGTTGGGGGTCCGTTCG GCCCGGAGTCAGATGACTCGCTTACGTG CACGCTAGAAGGTGCTAGGGCTAGCTCTTT GTCATGGGAGCATTCATGCCGCGACGCAC CGCCTTTACTCCTGGAAGATATGACATGA GTAAACCCCGGGCGGTGCAACCACAGGCGG TGGTCTGGGCATTGTGCTTGAGCACACTTA 67972

16 sample1_fhg_pass # of input seq: # of input family: # of seq after filter: ; / =95.0% # of family after filter: ; /987972=83.6% # of seq with repeat fragment: ; / =5.0% # of family with repeat fragment: ; /987972=16.4% # of seq of >=7A,>=8C,>=6G,>=7T: ; / =5.0% # of family of >=7A,>=8C,>=6G,>=7T: ; /987972=16.4% # of seq of >10 dimer: 5; 5/ =0.0% # of family of >10 dimer: 5; 5/987972=0.0% # of seq of >6 trimer: 18; 18/ =0.0% # of family of >6 trimer: 18; 18/987972=0.0% # of seq of >5 tetramer: 8; 8/ =0.0% # of family of >5 tetramer: 8; 8/987972=0.0% Index seq count 1 TAGCACTCAAGTGTTTTGCACTGG AGAACGGTTTGCTATTTCTG TAACGGGGGTCACCTTCGGCAG CTCGAATTTTTCCAATCAC GTATGCATAATTGCAAGCACAT TTGCCTCACTCGTACAAAAGGCC GGATCAGGACCCGACTCCACATTAG AGAGGCCACCAAGATCTTAGGCC AAACAGCGTCTCAGTGTAATTG TCTGTCTATCTTCTTAAT CTGTCCCCCTTGTCTGATACA ACAAATGGCGGACGCGAATC GGCAAGTCTTCCTGCGA TGATGGAGGAACGTAAACGT GTCCACATCATCCCGGGGTCG TCGCGTATTGCTATTTAGGA ATCAACAACCAATCTCGATAT TATCAGTAACGTACATGCCCCCGAT ACTCCTGGAGGTCTCGCTCGTCTA AATCGAATCTTCGATACGTCGT CGTGGAGGGAGGCCGTCAGTTT GTCCATCTAGCCAATAGGC AGTAACCAACCGTGAGAGTGTTGGC GCTGGTTTTTGGAGCATG AGTAGGTCTGTAAGGGGT CAGTAAACGAAAGGACCGAGACT TTACACAAACATATCAGCGAT CTCGTTATTAGTAATACTC TAAGTGGTAAATTAACCGTTACACC TCGCGAGCTGACCAGTATCACG 23074

17 sample1_fhg_gp1_align input file: sample1/miralign/sample1_fhg_align.txt for mirs input file: sample1/db/sample1_fhg_db.fa for sequ seq input file: sample1/output/sample1_fhg_gp1.txt for cluster & genomeseq Conventions: 1. the. in alignments means that the base is same as that in reference. 2. the * in alignments means that the base is same as that in reference, but the * is the mature part of precursor. The capital bases in * region also belong to mature. 3. the * in #error means that this sequseq has deletion compared with reference. 4. the + in #error means that this sequseq is without 3ADT cut, and the previous part of the sequseq is mapped to the reference and the other part is removed. 5. full length precursor and sequseq (except for the sequseq without 3ADT cut, which is indicated by + in #error) are listed below. clusterno=1 chr=1 gi=nt_ strand=1 #mirs=7 #copy(all)=3832 #family(all)=37 #copy(0error)=1167 #family(0error)=12 #copy(1error)=2665 #family(1error)=25 genome GTATGCCTTAACAGCAAGCGCAGTAGCGTAGCGACTGGGCATGAACGCGACGTTGATGAACTCGTAAGTTCTTCCACAAGTCTGACCGTCGTATAAG #error hsa-mir-xxe 1...**********************...********************** hsa-mir-xxe,hsa-mir-xxe* ptr-mir-xxe 1...********************** ptr-mir-xxe mml-mir-xxe 1...a...********************** mml-mir-xxe mmu-mir-xxe 1...**********************...c...********************** mmu-mir-xxe,mmu-mir-xxe* bta-mir-xxe 1...************************...c...cga bta-mir-xxe-5p oan-mir-xxe 1.t...***********************..g... c.ga...**********************...a.. 94 oan-mir-xxe,oan-mir-xxe* cfa-mir-xxe 1...a...g.********************** 64 cfa-mir-xxe 275_count= _count= _count= _count= _count= _count= _count= _count= c _count= c _count= c _count= t _count= c _count= a _count= c _count=8 1...c _count=7 1...c _count=5 1...a _count=4 1...t _count=3 1...t _count=3 1...c _count=3 1...t _count=3 1...c _count= _count= _count= _count= _count= _count= t _count= g _count= g _count= g _count=9 1...a 23 1

18 sample1_fhg_gp1_align 19877_count=6 1...g _count=3 1...t 21 1 clusterno=2 chr=1 gi=nt_ strand=1 #mirs=5 #copy(all)=156 #family(all)=7 #copy(0error)=137 #family(0error)=4 #copy(1error)=19 #family(1error)=3 genome CTCTTGCGAAAAATAAATAAACGCTCAATTAGATGGCGGCGGATTGGGTCCCCCCTAGAAGCGACAGGGTTGCTGCTGAACTCGGTGGTTCTGTGAG #error bta-mir- XXc 4 a..g...c...***********************...g...g. 104 bta-mir-xxc hsa-mir-xxc ***********************...********************** hsa-mir-xxc ptr-mir-xxc *********************** ptr-mir-xxc mmu-mir-xxc t...***********************...********************** mmu-mir-xxc cfa-mir-xxc-1 1 ************************ cfa-mir-xxc 1630_count= _count= _count= _count= a _count=6 1...a _count=3 1...t _count= clusterno=3 chr=1 gi=nt_ strand=1 #mirs=5 #copy(all)=22 #family(all)=2 #copy(0error)=19 #family(0error)=1 #copy(1error)=3 #family(1error)=1 genome TCATAAAAATGTCGAGGAATGGCGGCTCGCGTAGACCCGCACCCCACCCCTTCGAAGCTCATTGCGTCAGTTCCACGATTC #error mmu-mir-xxx 1...********************** mmu-mir-xxx bta-mir-xxx 1...**********************.g...a.. 80 bta-mir-xxx hsa-mir-xxx 1...********************** hsa-mir-xxx mne-mir-xxx 1...*******************G** mne-mir-xxx cfa-mir-xxx 1...********************** 61 cfa-mir-xxx 6971_count= _count=3 1...a 24 1 clusterno=4 chr=1 gi=nt_ strand=1 #mirs=2 #copy(all)=33402 #family(all)=49 #copy(0error)=481 #family(0error)=12 #copy(1error)=32921 #family(1error)=37 genome GCTCCCCTATAAGAAGCGCGGAAGCCGGCTTATATGTTTCCCCATTATCATCGAACTTTCGATTGGGCCCCGTAACTCT #error ptr-mir-xxx ********************** ptr-mir-xxx hsa-mir-xxx ********************** hsa-mir-xxx 1326_count= _count= _count= _count= _count= _count= _count= _count= _count= g _count= g _count= g _count= g _count= t _count= t _count= g _count= a _count= t _count= g _count= g _count= t 21 1

19 sample1_fhg_gp1_mirlist #sequ_seq_id seq length clusterno=1: mir 40e 3p _count=837 AAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA _count=62 AAATTGTCGTCCGAACGACCCA 23 CTAAATTGTCGTCCGAACGACCCA _count=32 AAATTGTCGTCCGAACGACCCA 23 CTAAATTGTCGTCCGAACGACCCA _count=12 CTAAATTGTCGTCCGAACGACC 22 CTAAATTGTCGTCCGAACGACCCA _count=7 CTAAATTGTCGTCCGAACGACCCA 25 CTAAATTGTCGTCCGAACGACCCA _count=7 CTAAATTGTCGTCCGAACGACC 22 CTAAATTGTCGTCCGAACGACCCA _count=5 TAAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA _count=4 AAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA _count=3 TAAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA _count=3 AAATTGTCGTCCGAACGAC 19 CTAAATTGTCGTCCGAACGACCCA _count=3 AAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA _count=3 CTAAATTGTCGTCCGAACGA 20 CTAAATTGTCGTCCGAACGACCCA 1 3 clusterno=1: mir 40e 5p _count=133 AAGAGTGCGTTGATTGTGGGTA 22 TCAAGAGTGCGTTGATTGTGGGTA _count=61 CAAGAGTGCGTTGATTGTGGG 21 TCAAGAGTGCGTTGATTGTGGGTA _count=4 AAGAGTGCGTTGATTGTGG 19 TCAAGAGTGCGTTGATTGTGGGTA _count=4 AAGAGTGCGTTGATTGTGGGT 21 TCAAGAGTGCGTTGATTGTGGGTA _count=4 AAGAGTGCGTTGATTGTGGG 20 TCAAGAGTGCGTTGATTGTGGGTA _count=168 CAAGAGTGCGTTGATTGTGGGT 22 TCAAGAGTGCGTTGATTGTGGGTA _count=80 AAGAGTGCGTTGATTGTGGGTA 22 TCAAGAGTGCGTTGATTGTGGGTA _count=19 AAGAGTGCGTTGATTGTGGGT 21 TCAAGAGTGCGTTGATTGTGGGTA _count=12 TCAAGAGTGCGTTGATTGTGG 21 TCAAGAGTGCGTTGATTGTGGGTA _count=9 TCAAGAGTGCGTTGATTGTGGGT 23 TCAAGAGTGCGTTGATTGTGGGTA _count=6 AAGAGTGCGTTGATTGTGGG 20 TCAAGAGTGCGTTGATTGTGGGTA _count=3 CAAGAGTGCGTTGATTGTGGG 21 TCAAGAGTGCGTTGATTGTGGGTA _count=3 AAGAGTGCGTTGATTGTGGGTA 22 TCAAGAGTGCGTTGATTGTGGGTA _count=3 AAGAGTGCGTTGATTGTGGGTA 23 TCAAGAGTGCGTTGATTGTGGGTA _count=3 AAGAGTGCGTTGATTGTGGGTA 23 TCAAGAGTGCGTTGATTGTGGGTA 3 3

20 sample1_fhg_gp1_mirlist clusterno=2: mir 40c 5p _count=106 AGTGGAGAGTGCCGCGTGTCTCG 24 GAGTGGAGAGTGCCGCGTGTCTCG _count=19 GAGTGGAGAGTGCCGCGTGTCTC 23 GAGTGGAGAGTGCCGCGTGTCTCG _count=7 AGTGGAGAGTGCCGCGTGTCTC 22 GAGTGGAGAGTGCCGCGTGTCTCG _count=10 AGTGGAGAGTGCCGCGTGTCTCG 24 GAGTGGAGAGTGCCGCGTGTCTCG _count=6 AGTGGAGAGTGCCGCGTGTCTCG 25 GAGTGGAGAGTGCCGCGTGTCTCG _count=3 GAGTGGAGAGTGCCGCGTGTCTCG 25 GAGTGGAGAGTGCCGCGTGTCTCG 1 2 clusterno=2: mir 40c 3p _count=5 ATCGCAGAATGCGCCTTGAT 22 CATCGCAGAATGCGCCTTGAT 2 1 clusterno=3: mir p _count=9 AACTAGCGGTCTCTTTCGCGT 21 AACTAGCGGTCTCTTTCGCGTGGA _count=13 ACTAGCGGTCTCTTTCGCGTGG 22 AACTAGCGGTCTCTTTCGCGTGGA 2

21 sample1_fhg_gp1_sum input file: sample1/miralign/sample1_fhg_align.txt for mirs input file: sample1/db/sample1_fhg_db.fa for sequ seq input file: sample1/output/sample1_fhg_gp1.txt for cluster & genomeseq Title: Position clusters of mirs mapping to genome input file: sample1/lists/sample1_fhg_align_chry.txt for sequence start position input file: sample1/miralign/sample1_fhg_matchedmirs.fa for mapped mir IDs input file: sample1/lists/sample1_fhg_align_chr1.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr2.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr3.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr4.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr5.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr6.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr7.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr8.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr9.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr10.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr11.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr12.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr13.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr14.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr15.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr16.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr17.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr18.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr19.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr20.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr21.txt for alignment data input file: sample1/lists/sample1_fhg_align_chr22.txt for alignment data input file: sample1/lists/sample1_fhg_align_chrx.txt for alignment data input file: sample1/lists/sample1_fhg_align_chry.txt for alignment data unique mammalian mirs mapped by sequ seq: 6456 # of unique mammalian mirs mapped by sequ seq & genome: 5256; 5256/6456=85.6% # of position clusters of the mapped mirs: 574 length of the genome: position cluster: the distance of two near positions in a clustaer is < 50 For the mirs mapped by sequ seq and genome after re alignment: # of position clusters: 457 # of mammalian mirs mapped to sequ seq and genome: 4839 # of unique sequ seq: # of unique sequ family: 5330 Note: represent mir at 5p or 3p is the sequenced sequence of lowest error# in each cluster and highest copy# the mir_name is composed of 4 parts, mir, extension number, 5p or 3p, index of sequ seq. The extension number is the extension number in mirids of highest occurency # of unique mirs detected: 563; # of unique mirs is counted based on mir_name.

22 sample1_fhg_gp1_sum Index copy#(all isoforms in 5p family#(all isoforms in 5p or 3p) chr# chr_seqid strand mir_start mir_end mir_start mir_end #mirs mirids clustern o mir_name mir_seq mir_len copy# of the isoform or 3p) 1 1 mir 30e 5p 275 TGTAAACATCCTTGACTGGAAGCT NT_ hsa mir XXX 2 20 mir 28 3p 2238 CACTAGATTGTGAGCTCCTGGA NT_ hsa mir XXX 3 57 mir 27b 3p 127 TTCACAGTGGCTAAGTTCTGC NT_ hsa mir XXX 4 87 mir 625 3p GACTATAGAACTTTCCCCCTCA NT_ hsa mir XXX 5 98 mir p AGGGTAGATAGAACAGGTCTTG NT_ hsa mir XXX mir 21 3p 413 CAACACCAGTCGATGGGCTGTC NT_ hsa mir XXX mir 7e 3p CTATACGGCCTCCTAGCTTTCC NT_ hsa mir XXX mir 101 3p 121 GTACAGTACTGTGATAACTGAA NT_ hsa mir XXX mir 135b 3p ATGTAGGGCTAAAAGCCATGGG NT_ hsa mir XXX mir 425 3p 5257 ATCGGGAATGTCGTGTCCGCC NT_ hsa mir XXX mir 548p 3p CCAAAACTGCAGTTACTTTTGC NT_ hsa mir XXX mir 31 5p 412 AGGCAAGATGCTGGCATAGCTG NT_ hsa mir XXX mir 548l 5p AAAAGTATTTGCGGGTTTTGTC NT_ hsa mir XXX mir 125b 5p 90 TCCCTGAGACCCTAACTTGTGA NT_ hsa mir XXX mir p 5124 TCTGGGTGGTCTGGAGATTTGTG NT_ hsa mir XXX mir 16 3p 9924 CCAGTATTAACTGTGCTGCTGA NT_ hsa mir XXX mir p GAGGCAGAAGCAGGATGACAA NT_ hsa mir XXX mir 33b 5p 3282 GTGCATTGCTGTTGCATTGCA NT_ hsa mir XXX mir 454 3p 8695 TAGTGCAATATTGCTTATAGGGTTT NT_ hsa mir XXX mir 320 3p 1624 AAAAGCTGGGTTGAGAGGG NT_ hsa mir XXX mir 548j 5p AAAAGTAATTGCGGTCTTTGGT NT_ hsa mir XXX mir 659 5p AGGACCTTCCCTGAACCAAGGA NT_ hsa mir XXX mir p ACGCCCTTCCCCCCCTTCTTCA NT_ hsa mir XXX mir 221 5p 816 ACCTGGCATACAATGTAGATTTCT X NT_ hsa mir XXX mir 221 3p 5 AGCTACATTGTCTGCTGGGTTTC X NT_ hsa mir XXX mir 222 5p 4253 CTCAGTAGCCAGTGTAGATCC X NT_ hsa mir XXX mir 222 3p 23 AGCTACATCTGGCTACTGGGTCTC X NT_ hsa mir XXX mir 548i 5p AAAAGTACTTGCGGATTTTGC X NT_ hsa mir XXX mir 361 5p 1854 TTATCAGAATCTCCAGGGGTAC X NT_ hsa mir XXX mir 361 3p TCCCCCAGGTGTGATTCTGATTT X NT_ hsa mir XXX mir 421 3p 3255 ATCAACAGACATTAATTGGGCGC X NT_ hsa mir XXX

23 sample1_fhg_clusterposition input from sample1/0finalreport/3_sample1_fhg_gp1_sum.txt input from sample1/0finalreport/4_sample1_fhg_gp2_sum.txt input from sample1/0finalreport/6_sample1_fhg_gp4_sum.txt The position refers to the start position of mir in human genome. ESTs are added one by one after chromosome X. The PositionInSeq refers to the position of mir in its own contig sequence or EST Refer to above three input files to get definitions of mir_name & mir_seq The clusterdistance is the difference of the positions of the current and previous clusters. Minimum c 1 Maximum c #Copy (all #family (all cluster isoforms in 5p isoforms in 5p StartPosition EndPosition Index Position Distance or 3p) or 3p) Chr# Strand InSeq InSeq Type unique_mirs mir_name mir_seq predict (gp4) PC 5p XXXXX GCGGCACTGAGGCTTATAGCGGAA predict (gp4) PC 3p XXXXX TGTACGGCCATCCAGCTCTAGGCC predict (gp4) PC 5p XXXXX GGAATAGCACATCAAGTAGGT predict (gp4) PC 5p XXXXX CACGGCCATTAGACGACGCCGGG predict (gp4) PC 3p XXXXX TGTACGTAAAGTGACTCCACTAA predict (gp4) PC 5p XXXXX TGATAGGCCCTACTGTCCATGTT known (gp1) hsa mir XXX mir XXX 5p XXX TATGCCAGGCAGTTATACCAT known (gp1) hsa mir XXX mir XXX 5p XXX CAAGCGTTTGTCAACAAAGTGTTGA predict (gp4) PC 5p XXXXX TTGTCCGATTATGTGCTCG known (gp1) mmu mir XXX;mml mir XXX;b mir XXX 5p XXX CCGCGACGTTTTCGGGACCGA known (gp1) mmu mir XXX;mml mir XXX;b mir XXX 3p XXX CCAAACTCGCGAACTAG known (gp1) mmu mir XXX;mml mir XXX;b mir XXX 5p XXX GAACTCTACGAATCATCCTAGTATG known (gp1) mmu mir XXX;mml mir XXX;b mir XXX 3p XXX GCTGCCTCCGTACGATGCTA predict (gp4) PC 5p XXXXX ATCTAATGTGGGTGACACTGGT predict (gp4) PC 5p XXXXX GGGGTTTAGGGTACCCGCTTCTG predict (gp4) PC 5p XXXXX TGTTGAGCGATTGCATGCAACTTA predict (gp4) PC 5p XXXXX TGGGTGCGTGTGGTCACGTC predict (gp4) PC 5p XXXXX TGCGCGTCTTTATTATC predict (gp4) PC 5p XXXXX CCGTGATTGGACCGTCGCGTTCGT predict (gp4) PC 5p XXXXX CACTGCCGAACGATCTGTGATTCC predict (gp4) PC 3p XXXXX TTCACGCTGGGTTATATCTCTCGC predict (gp4) PC 5p XXXXX CCTCTCCTGGTTAGTCCA predict (gp4) PC 3p XXXXX CCAAGCAGTCTGGCATCTTATGC known (gp1) mmu mir XXX;mml mir XXX;b mir XXX 3p XXX GTATCGCTATCGCCCAGAGCGTCG predict (gp4) PC 5p XXXXX CACCCAAGGACCCCGCC known (gp1) ssc mir XXX mir XXX 3p XXX GCGCCTTCCGCCGATTTTGT predict (gp4) PC 3p XXXXX ACGATACTGTACTCGGG predict (gp4) PC 5p XXXXX CCAAGAGGTGTGTTGAGCA predict (gp4) PC 5p XXXXX AGTTTTCGCACGGCGTGTCAT predict (gp4) PC 5p XXXXX TGCTTATGCAGCTTTGTAGCCT known (gp1) mmu mir XXX;mml mir XXX;b mir XXX 5p XXX AAGGCGGGTCTACTAAGGGGAGC predict (gp4) PC 5p XXXXX GAAACCAGCTAAGCAATGC known (gp1) mmu mir XXX;mml mir XXX;b mir XXX 5p XXX GGGCATAACTGTGGGCTGAC predict (gp4) PC 5p XXXXX TTGAGGTCCGTTCCTCAGTCGACCT predict (gp4) PC 3p XXXXX TAGAGGTAGCCACAAGGATAGCG

24 Table 1 - Data summary Raw Data #SequSeq #UniqueSeq # Raw Sequ seq 9,922,513 1,592,666 Data Processing #SequSeq %SequSeq #UniqueSeq %UniqueSeq 1. impurity sequences filtered 948, % 623, % 2. Copy#<3 filtered 913, % 845, % 3. Length < 15 filtered 204, % 56, % 4. mrna,rfam,repbase filtered 362, % 11, % 5. Final Mappable 7,493, % 55, % Total 9,922, % 1,592, % Table 2 - Length distribution of mappable data Length #SequSeq %FinalMappable SequSeq #UniqueSeq %FinalMappable UniqueSeq #SequSeq/ #UniqueSeq 15 52, % % , % 1, % , % % , % 1, % , % % , % 1, % , % 3, % ,438, % 16, % ,810, % 8, % , % 2, % , % % , % % , % % , % % , % % , % % , % 15, % 7.4 Final Mappable 7,493, % 55, % 135.1

25 Table 3 - Unique seq mapped to unique mammalian mirs #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable # Known mammalian unique mir in mirbase v14.0 3,924 # Known mammalian unique mir in mirbase v14.0 2,656 # Unique mir in mirbase mapped 1,599 # Unique mir in mirbase mapped 2,273 Mapped to mirbase 6,013, % 80.25% 12, % 22.46% # Known hsa mir in mirbase v # Known hsa mir in mirbase v # Unique hsa mir in mirmirbase mapped 286 # Unique hsa mir in mirbase mapped 399 Mapped to hsa of mirbase 5,937, % 79.24% 8, % 15.13% Table 4 - Cluster I: Sequ seq mapped to mammalian mirs that further mapped to genome Mapping to mammalian: #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster I 5,734, % 76.53% 8, % 15.03% # Alignment-cluster in Cluster I 389 # Unique mir in Cluster I 1,456 # Unique mir in Cluster I 309 FileName FileName FileName Sequ Seq Sequ Seq Mapped_Data/sample1_FHG_gp1_Align.txt Mapped_Data/sample1_FHG_gp1_Sum.txt Mapped_Data/sample1_FHG_gp1_miRlist.txt Unique Seq Unique Seq Mapping to species hsa: Sequ Seq Unique Seq #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster I 5,734, % 76.53% 8, % 15.01% #Unique hsa mir in Cluster I 295 #Unqiue hsa mir in Cluster I 399

26 Table 5 - Cluster II: Sequ seq mapped to both mammalian mirs and genome, but the mirs unmapped to genome: Mapping to mammalian: Sequ Seq Unique Seq #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster II % 0.00% % 0.02% # Alignment-cluster in Cluster II 7 # Unique mir in Cluster II 3 # Unique mir in Cluster II 4 FileName FileName FileName Mapped_Data/sample1_FHG_gp2_Align.txt Mapped_Data/sample1_FHG_gp2_Sum.txt Mapped_Data/sample1_FHG_gp2_miRlist.txt Table 6 - Cluster III: Sequ seq mapped to mammalian mirs, but the mirs unmapped to genome (sequence cluster): Mapping to mammalian: Sequ Seq Unique Seq #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster III 2, % 0.04% % 0.26% # Alignment-cluster in Cluster III 23 # Unique mir in Cluster III 37 # Unique mir in Cluster III 18 FileName FileName FileName Mapped_Data/sample1_FHG_gp3_Align.txt Mapped_Data/sample1_FHG_gp3_Sum.txt Mapped_Data/sample1_FHG_gp3_miRlist.txt Mapping to species hsa: Sequ Seq Unique Seq #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster III % 0.00% % 0.00% #Unique hsa mir in Cluster III 0 #Unqiue hsa mir in Cluster III 0

27 Table 7 - Cluster IV: Sequ seq mapped to genome, but unmapped to mammalian mirs (predict new hairpin by mfold): #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster IV 18, % 0.25% 1, % 2.53% # Alignment-cluster in Cluster IV 1,095 # Unique mir in Cluster IV 703 FileName FileName FileName Table 8 - Unmapped Sequ Seq Mapped_Data/sample1_FHG_gp4_Align.txt Mapped_Data/sample1_FHG_gp4_Sum.txt Mapped_Data/sample1_FHG_gp4_miRlist.txt #SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Nohit 847, % 11.31% 40, % 72.15% FileName Table 9 - Mapping summary Mapped_Data/sample1_FHG_nohit.txt #SequSeq %SequSeq #UniqueSeq %UniqueSeq Raw 9,922, % 1,592, % Mappable 7,493, % 55, % Mapped to mirbase (including nohit 1) 6,013, % 12, % Mapped to Cluster I 5,734, % 8, % Mapped to Cluster II % % Mapped to Cluster III 2, % % Mapped to Cluster IV 18, % 1, % Mapped (total) 5,756, % 9, % Nohit (including nohit 1 and nohit 2) 847, % 40, % Note: Mapped (total) + Nohit should equal to mappable Table 10 - Detected mir summary # of unique mirs detected in Cluster I 309 # of unique mirs detected in Cluster II 4 # of unique mirs detected in Cluster III 18 # of unique mirs detected in Cluster IV 703 Total 1,034 Sequ Seq Sequ Seq Unique Seq Unique Seq FileName Mapped_Data/sample1_FHG_uni_miRs.txt

28 Note: Definition of mir_name: in cluster group1: the mir_name is composed of 4 parts, mir, extension number, 5p or 3p, index of sequ seq. The extension number is the extension number in mirids of highest occurency. in cluster group2: the mir_name is composed of 4 parts, PC, extension number, 5p or 3p, index of sequ seq. PC means 'Predicted Candidate'. The extension number is the extension number in mirids of highest occurency. in cluster group3: the mir_name is composed of 4 parts, PN, extension number, 5p or 3p, index of sequ seq. PN means 'Predicted Novel mir'. The extension number is the extension number in mirids of highest occurency. in cluster group4: the mir_name is composed of 3 parts, PC, 5p or 3p, index of sequ seq. PC means 'Predicted Candidate'. The mir sequence is from the isoform of lowest error# and highest copy# in cluster groups 1, 2 & 4. The mir sequence is from the isoform of highest copy# regardless its error# in cluster groups 3. # of unique mirs detected in these 4 groups is counted from different mir_seq.

E.Z.N.A. MicroElute Clean-up Kits Table of Contents

E.Z.N.A. MicroElute Clean-up Kits Table of Contents E.Z.N.A. MicroElute Clean-up Kits Table of Contents Introduction... 2 Kit Contents... 3 Preparing Reagents/Storage and Stability... 4 Guideline for Vacuum Manifold... 5 MicroElute Cycle-Pure - Spin Protocol...

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

C101-E112. BioSpec-nano. Shimadzu Spectrophotometer for Life Science

C101-E112. BioSpec-nano. Shimadzu Spectrophotometer for Life Science C101-E112 BioSpec-nano Shimadzu Spectrophotometer for Life Science Power of small. BioSpec-nano BioSpec-nano Shimadzu Spectrophotometer for Life Science Quick and Simple Nucleic Acid Quantitation Drop-and-Click

More information

Analytical Study of Hexapod mirnas using Phylogenetic Methods

Analytical Study of Hexapod mirnas using Phylogenetic Methods Analytical Study of Hexapod mirnas using Phylogenetic Methods A.K. Mishra and H.Chandrasekharan Unit of Simulation & Informatics, Indian Agricultural Research Institute, New Delhi, India akmishra@iari.res.in,

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

High-throughput Quantification of DNA for NGS Library Prep with the Zephyr G3 Workstation and the VICTOR Nivo Plate Reader

High-throughput Quantification of DNA for NGS Library Prep with the Zephyr G3 Workstation and the VICTOR Nivo Plate Reader TECHNICAL APPLICATION NOTE High-throughput Quantification of DNA for NGS Library Prep with the Zephyr G3 Workstation and the VICTOR Nivo Plate Reader NGS Automation Image or Color Block Area Next generation

More information

Automated Illumina TruSeq Stranded mrna library construction with the epmotion 5075t/TMX

Automated Illumina TruSeq Stranded mrna library construction with the epmotion 5075t/TMX SHORT PROTOCOL No. 02 I November 2014 Automated Illumina TruSeq Stranded mrna library construction with the epmotion 5075t/TMX Introduction For the MiSeq and HiSeq next generation sequencing (NGS) systems,

More information

microrna Studies Chen-Hanson Ting SVFIG June 23, 2018

microrna Studies Chen-Hanson Ting SVFIG June 23, 2018 microrna Studies Chen-Hanson Ting SVFIG June 23, 2018 Summary MicroRNA (mirna) Species and organisms studied mirna in mitocondria Huge genome files mirna in human Chromosome 1 mirna in bacteria Tools used

More information

Formation and Determination of the Oxidation Products of 5- Methylcytosine in RNA

Formation and Determination of the Oxidation Products of 5- Methylcytosine in RNA Electronic Supplementary Material (ESI) for Chemical Science. This journal is The Royal Society of Chemistry 2016 Supporting Information For Formation and Determination of the Oxidation Products of 5-

More information

High-throughput sequence alignment. November 9, 2017

High-throughput sequence alignment. November 9, 2017 High-throughput sequence alignment November 9, 2017 a little history human genome project #1 (many U.S. government agencies and large institute) started October 1, 1990. Goal: 10x coverage of human genome,

More information

Ion Torrent. The chip is the machine

Ion Torrent. The chip is the machine Ion Torrent Introduction The Ion Personal Genome Machine [PGM] is simple, more costeffective, and more scalable than any other sequencing technology. Founded in 2007 by Jonathan Rothberg. Part of Life

More information

Spectrophotometer for Life Science. BioSpec-nano C101-E112D

Spectrophotometer for Life Science. BioSpec-nano C101-E112D Spectrophotometer for Life Science BioSpec-nano C11-E112D BioSpec-nano Spectrophotometer for Life Science Drop-and-Start Analysis Automatic Optical Pathlength Setting & Automatic Wiping Low Carryover Achieved

More information

Genotyping By Sequencing (GBS) Method Overview

Genotyping By Sequencing (GBS) Method Overview enotyping By Sequencing (BS) Method Overview RJ Elshire, JC laubitz, Q Sun, JV Harriman ES Buckler, and SE Mitchell http://wwwmaizegeneticsnet/ Topics Presented Background/oals BS lab protocol Illumina

More information

CREATING CUSTOMIZED DATE RANGE COLLECTIONS IN PRESENTATION STUDIO

CREATING CUSTOMIZED DATE RANGE COLLECTIONS IN PRESENTATION STUDIO CREATING CUSTOMIZED DATE RANGE COLLECTIONS IN PRESENTATION STUDIO Date range collections are pre-defined reporting periods for performance data. You have two options: Dynamic date ranges automatically

More information

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction

More information

TUTORIAL EXERCISES WITH ANSWERS

TUTORIAL EXERCISES WITH ANSWERS TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129

More information

Genotyping By Sequencing (GBS) Method Overview

Genotyping By Sequencing (GBS) Method Overview enotyping By Sequencing (BS) Method Overview Sharon E Mitchell Institute for enomic Diversity Cornell University http://wwwmaizegeneticsnet/ Topics Presented Background/oals BS lab protocol Illumina sequencing

More information

A Browser for Pig Genome Data

A Browser for Pig Genome Data A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes

More information

ESPRIT Feature. Innovation with Integrity. Particle detection and chemical classification EDS

ESPRIT Feature. Innovation with Integrity. Particle detection and chemical classification EDS ESPRIT Feature Particle detection and chemical classification Innovation with Integrity EDS Fast and Comprehensive Feature Analysis Based on the speed and accuracy of the QUANTAX EDS system with its powerful

More information

express: Streaming read deconvolution and abundance estimation applied to RNA-Seq

express: Streaming read deconvolution and abundance estimation applied to RNA-Seq express: Streaming read deconvolution and abundance estimation applied to RNA-Seq Adam Roberts 1 and Lior Pachter 1,2 1 Department of Computer Science, 2 Departments of Mathematics and Molecular & Cell

More information

RNA Transport. R preps R preps

RNA Transport. R preps R preps RNA Transport R0527-00 5 preps R0527-01 50 preps July 2014 RNA Transport Table of Contents Introduction...2 Kit Contents/Storage and Stability...3 Protocol...4 Storage Procedure...4 Recovery Procedure...5

More information

ncounter PlexSet Data Analysis Guidelines

ncounter PlexSet Data Analysis Guidelines ncounter PlexSet Data Analysis Guidelines NanoString Technologies, Inc. 530 airview Ave North Seattle, Washington 98109 USA Telephone: 206.378.6266 888.358.6266 E-mail: info@nanostring.com Molecules That

More information

Araport, a community portal for Arabidopsis. Data integration, sharing and reuse. sergio contrino University of Cambridge

Araport, a community portal for Arabidopsis. Data integration, sharing and reuse. sergio contrino University of Cambridge Araport, a community portal for Arabidopsis. Data integration, sharing and reuse sergio contrino University of Cambridge Acknowledgements J Craig Venter Institute Chris Town Agnes Chan Vivek Krishnakumar

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

MRC-Holland MLPA. Description version 14; 21 January 2015

MRC-Holland MLPA. Description version 14; 21 January 2015 SALSA MLPA probemix P229-B2 OPA1 Lot B2-0412. As compared to version B1-0809, two reference probes and the 88 and 96 nt control fragments have been replaced (QDX2). The OPA1 gene product is a nuclear-encoded

More information

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance OECD QSAR Toolbox v.4.0 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise

More information

Bayesian Clustering of Multi-Omics

Bayesian Clustering of Multi-Omics Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics

More information

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database Overview - MS Proteomics in One Slide Obtain protein Digest into peptides Acquire spectra in mass spectrometer MS masses of peptides MS/MS fragments of a peptide Results! Match to sequence database 2 But

More information

MassHunter TOF/QTOF Users Meeting

MassHunter TOF/QTOF Users Meeting MassHunter TOF/QTOF Users Meeting 1 Qualitative Analysis Workflows Workflows in Qualitative Analysis allow the user to only see and work with the areas and dialog boxes they need for their specific tasks

More information

Hairpin Database: Why and How?

Hairpin Database: Why and How? Hairpin Database: Why and How? Clark Jeffries Research Professor Renaissance Computing Institute and School of Pharmacy University of North Carolina at Chapel Hill, United States Why should a database

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Automation of ChIP-Seq Library Preparation for Next Generation Sequencing on the epmotion 5075t

Automation of ChIP-Seq Library Preparation for Next Generation Sequencing on the epmotion 5075t APPLICATION NOTE No. 275 Automation of ChIP-Seq Library Preparation for Next Generation Sequencing on the epmotion 5075t Cheng Liu Ph.D. 1, Maryke Appel Ph.D. 2 1 Eppendorf North America, Hauppauge, NY,

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Automated Illumina TruSeq Stranded Total RNA library construction with the epmotion 5075t/TMX

Automated Illumina TruSeq Stranded Total RNA library construction with the epmotion 5075t/TMX SHORT PROTOCOL No. 01 I November 2014 Automated Illumina TruSeq Stranded Total RNA library construction with the epmotion 5075t/TMX Introduction This protocol describes the configuration and preprogrammed

More information

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector.

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. Omar S. Akbari*, Igor Antoshechkin*, Henry Amrhein, Brian Williams, Race Diloreto, Jeremy

More information

Isoform discovery and quantification from RNA-Seq data

Isoform discovery and quantification from RNA-Seq data Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification

More information

Qubit RNA IQ Assay Kits

Qubit RNA IQ Assay Kits USER GUIDE Qubit RNA IQ s Catalog No. Q33221, Q33222 Pub. No. MAN0017405 Rev. B.0 Product information The Qubit RNA IQ provides a fast, simple method to check whether an RNA sample has degraded using the

More information

Ion Sphere Assay on the Qubit 3.0 Fluorometer

Ion Sphere Assay on the Qubit 3.0 Fluorometer USER GUIDE Ion Sphere Assay on the Qubit 3.0 Fluorometer for use with: Ion Sphere Quality Control Kit (Cat. No. 4468656) Publication Number MAN0016388 Revision A.0 Ion Sphere Assay overview... 2 Materials

More information

Globin-Zero Gold Kit

Globin-Zero Gold Kit Cat. No. GZG1206 (Contains 1 box of Cat. No. GZRR1306 and 1 box of Cat. No. MRZ116C) Cat. No. GZG1224 (Contains 1 box of Cat. No. GZRR1324 and 1 box of Cat. No. MRZ11124C) Connect with Epicentre on our

More information

WeatherHawk Weather Station Protocol

WeatherHawk Weather Station Protocol WeatherHawk Weather Station Protocol Purpose To log atmosphere data using a WeatherHawk TM weather station Overview A weather station is setup to measure and record atmospheric measurements at 15 minute

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

NGS Made Easy. Optimize your NGS library preparation with the epmotion Automated liquid handling system

NGS Made Easy. Optimize your NGS library preparation with the epmotion Automated liquid handling system NGS Made Easy Optimize your NGS library preparation with the epmotion Automated liquid handling system NGS Library Preparation Made Easy and Reliable Next-generation sequencing sample preparation is a

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Compounding insights Thermo Scientific Compound Discoverer Software

Compounding insights Thermo Scientific Compound Discoverer Software Compounding insights Thermo Scientific Compound Discoverer Software Integrated, complete, toolset solves small-molecule analysis challenges Thermo Scientific Orbitrap mass spectrometers produce information-rich

More information

Supplementary Information

Supplementary Information Supplementary Information A versatile genome-scale PCR-based pipeline for high-definition DNA FISH Magda Bienko,, Nicola Crosetto,, Leonid Teytelman, Sandy Klemm, Shalev Itzkovitz & Alexander van Oudenaarden,,

More information

Automated purification of high quality genomic DNA

Automated purification of high quality genomic DNA APPLICATION NOTE No. AA267 I October 2012 Automated purification of high quality genomic DNA from various tissues using the Eppendorf MagSep Tissue gdna Kit on the Eppendorf epmotion M5073 Ulrich Wilkening,

More information

Eppendorf twin.tec PCR Plates 96 LoBind Increase Yield of Transcript Species and Number of Reads of NGS Libraries

Eppendorf twin.tec PCR Plates 96 LoBind Increase Yield of Transcript Species and Number of Reads of NGS Libraries APPLICATION NOTE No. 375 I December 2016 Eppendorf twin.tec PCR Plates 96 LoBind Increase Yield of Transcript Species and Number of Reads of NGS Libraries Hanae A. Henke¹, Björn Rotter² ¹Eppendorf AG,

More information

Dr. OligoTM DNA / RNA / OLIGO SYNTHESIZERS

Dr. OligoTM DNA / RNA / OLIGO SYNTHESIZERS Dr. OligoTM DNA / RNA / OLIGO SYNTHESIZERS High Throughput Oligo Synthesis Synthesize Cleave Deprotect Desalt Elute Dr. Oligo TM The Dr. Oligo TM High Throughput Oligo Synthesizer is available in four

More information

MRC-Holland MLPA. Description version 09; 25 April 2017

MRC-Holland MLPA. Description version 09; 25 April 2017 SALSA MLPA probemix P143-C2 MFN2-MPZ Lot C2-0317. As compared to version C1-0813, one reference probe has been removed and two replaced, in addition several probe lengths have been adjusted. This P143

More information

Operation Manual. SPECTRO-NANO4 Nucleic Acid Analyzer PLEASE READ THIS MANUAL CAREFULLY BEFORE OPERATION

Operation Manual. SPECTRO-NANO4 Nucleic Acid Analyzer PLEASE READ THIS MANUAL CAREFULLY BEFORE OPERATION Operation Manual SPECTRO-NANO4 Nucleic Acid Analyzer PLEASE READ THIS MANUAL CAREFULLY BEFORE OPERATION 3, Hagavish st. Israel 58817 Tel: 972 3 5595252, Fax: 972 3 5594529 mrc@mrclab.com MRC. 4.18 Foreword

More information

MassHunter Software Overview

MassHunter Software Overview MassHunter Software Overview 1 Qualitative Analysis Workflows Workflows in Qualitative Analysis allow the user to only see and work with the areas and dialog boxes they need for their specific tasks A

More information

ON SITE SYSTEMS Chemical Safety Assistant

ON SITE SYSTEMS Chemical Safety Assistant ON SITE SYSTEMS Chemical Safety Assistant CS ASSISTANT WEB USERS MANUAL On Site Systems 23 N. Gore Ave. Suite 200 St. Louis, MO 63119 Phone 314-963-9934 Fax 314-963-9281 Table of Contents INTRODUCTION

More information

NanoDrop One Viewer software NanoDrop One Website. NanoDrop One Website NanoDrop One Viewer software NanoDrop One Website Software System Update Update Update Software, Update Note OK Language Measure

More information

Ligand Scout Tutorials

Ligand Scout Tutorials Ligand Scout Tutorials Step : Creating a pharmacophore from a protein-ligand complex. Type ke6 in the upper right area of the screen and press the button Download *+. The protein will be downloaded and

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

PDF-4+ Tools and Searches

PDF-4+ Tools and Searches PDF-4+ Tools and Searches PDF-4+ 2019 The PDF-4+ 2019 database is powered by our integrated search display software. PDF-4+ 2019 boasts 74 search selections coupled with 126 display fields resulting in

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Ribo-Zero Magnetic Gold Kit*

Ribo-Zero Magnetic Gold Kit* Ribo-Zero Magnetic Gold Kit* (Epidemiology) Cat. No. MRZE706 (Contains 1 box of Cat. No. RZE1206 and 1 box of Cat. No. MRZ116C) Cat. No. MRZE724 (Contains 1 box of Cat. No. RZE1224 and 1 box of Cat. No.

More information

Programmed ph-driven Reversible Association and Dissociation of Inter-Connected. Circular DNA Dimer Nanostructures

Programmed ph-driven Reversible Association and Dissociation of Inter-Connected. Circular DNA Dimer Nanostructures Supporting information Programmed ph-driven Reversible Association and Dissociation of Inter-Connected Circular DNA Dimer Nanostructures Yuwei Hu, Jiangtao Ren, Chun-Hua Lu, and Itamar Willner* Institute

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

Data Sheet. Azide Cy5 RNA T7 Transcription Kit

Data Sheet. Azide Cy5 RNA T7 Transcription Kit Cat. No. Size 1. Description PP-501-Cy5 10 reactions à 40 µl For in vitro use only Quality guaranteed for 12 months Store all components at -20 C. Avoid freeze and thaw cycles. DBCO-Sulfo-Cy5 must be stored

More information

Training Path FNT IT Infrastruktur Management

Training Path FNT IT Infrastruktur Management Training Path FNT IT Infrastruktur Management // TRAINING PATH: FNT IT INFRASTRUCTURE MANAGEMENT Training Path: FNT IT Infrastructure Management 2 9 // FNT COMMAND BASIC COURSE FNT Command Basic Course

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

M E R C E R W I N WA L K T H R O U G H

M E R C E R W I N WA L K T H R O U G H H E A L T H W E A L T H C A R E E R WA L K T H R O U G H C L I E N T S O L U T I O N S T E A M T A B L E O F C O N T E N T 1. Login to the Tool 2 2. Published reports... 7 3. Select Results Criteria...

More information

Fog Monitor 100 (FM 100) Extinction Module. Operator Manual

Fog Monitor 100 (FM 100) Extinction Module. Operator Manual Particle Analysis and Display System (PADS): Fog Monitor 100 (FM 100) Extinction Module Operator Manual DOC-0217 Rev A-1 PADS 2.7.3, FM 100 Extinction Module 2.7.0 5710 Flatiron Parkway, Unit B Boulder,

More information

Gene Switches Teacher Information

Gene Switches Teacher Information STO-143 Gene Switches Teacher Information Summary Kit contains How do bacteria turn on and turn off genes? Students model the action of the lac operon that regulates the expression of genes essential for

More information

Appendix B Microsoft Office Specialist exam objectives maps

Appendix B Microsoft Office Specialist exam objectives maps B 1 Appendix B Microsoft Office Specialist exam objectives maps This appendix covers these additional topics: A Excel 2003 Specialist exam objectives with references to corresponding material in Course

More information

BioDrop DUO dsdna Application Note

BioDrop DUO dsdna Application Note BioDrop DUO dsdna Application Note Using a BioDrop DUO spectrophotometer to measure the concentration of low volume samples of dsdna Micro-volume measurement of DNA is a routine application in many life

More information

FOR RESEARCH USE ONLY.

FOR RESEARCH USE ONLY. MAN-10039-04 Vantage 3D DNA SNV Qualification Kit Vantage 3D DNA SNV Qualification Kit The ncounter Vantage 3D DNA SNV Qualification Kit is designed to assess whether the ncounter MAX, FLEX, or SPRINT

More information

Advanced Forecast. For MAX TM. Users Manual

Advanced Forecast. For MAX TM. Users Manual Advanced Forecast For MAX TM Users Manual www.maxtoolkit.com Revised: June 24, 2014 Contents Purpose:... 3 Installation... 3 Requirements:... 3 Installer:... 3 Setup: spreadsheet... 4 Setup: External Forecast

More information

CyFlow Ploidy Analyser & CyFlow Space High-resolution DNA analysis

CyFlow Ploidy Analyser & CyFlow Space High-resolution DNA analysis CyFlow Ploidy Analyser & High-resolution DNA analysis For agroscience breeding aquaculture CyFlow Ploidy Analyser www.sysmex-flowcytometry.com Dedicated solutions for ploidy analysis and determining genome

More information

Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing

Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing Analysis of Y-STR Profiles in Mixed DNA using Next Generation Sequencing So Yeun Kwon, Hwan Young Lee, and Kyoung-Jin Shin Department of Forensic Medicine, Yonsei University College of Medicine, Seoul,

More information

TruSight Cancer Workflow on the MiniSeq System

TruSight Cancer Workflow on the MiniSeq System TruSight Cancer Workflow on the MiniSeq System Prepare Library Sequence Analyze Data TruSight Cancer 1.5 days ~ 24 hours < 2 hours TruSight Cancer Library Prep MiniSeq System Local Run Manager Enrichment

More information

PDF-4+ Tools and Searches

PDF-4+ Tools and Searches PDF-4+ Tools and Searches PDF-4+ 2018 The PDF-4+ 2018 database is powered by our integrated search display software. PDF-4+ 2018 boasts 72 search selections coupled with 125 display fields resulting in

More information

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014 Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more

More information

OECD QSAR Toolbox v.3.4. Example for predicting Repeated dose toxicity of 2,3-dimethylaniline

OECD QSAR Toolbox v.3.4. Example for predicting Repeated dose toxicity of 2,3-dimethylaniline OECD QSAR Toolbox v.3.4 Example for predicting Repeated dose toxicity of 2,3-dimethylaniline Outlook Background Objectives The exercise Workflow Save prediction 2 Background This is a step-by-step presentation

More information

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription

More information

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance

Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Use of Agilent Feature Extraction Software (v8.1) QC Report to Evaluate Microarray Performance Anthea Dokidis Glenda Delenstarr Abstract The performance of the Agilent microarray system can now be evaluated

More information

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism Ling Yin 1,3,, Yunhe An 1,2,, Junjie Qu 3,, Xinlong Li 1, Yali Zhang 1, Ian Dry 5, Huijun Wu 2*, Jiang Lu 1,4** 1 College

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

U.S. Patent No. 9,051,563 and other pending patents. Ver

U.S. Patent No. 9,051,563 and other pending patents. Ver INSTRUCTION MANUAL Direct-zol 96 RNA Catalog Nos. R2054, R2055, R2056 & R2057 Highlights Quick, 96-well purification of high-quality (DNA-free) total RNA directly from TRIzol, TRI Reagent and all other

More information

mrna Isolation Kit for Blood/Bone Marrow For isolation mrna from blood or bone marrow lysates Cat. No

mrna Isolation Kit for Blood/Bone Marrow For isolation mrna from blood or bone marrow lysates Cat. No For isolation mrna from blood or bone marrow lysates Cat. No. 1 934 333 Principle Starting material Application Time required Results Key advantages The purification of mrna requires two steps: 1. Cells

More information

Supplemental Figure 1.

Supplemental Figure 1. Supplemental Material: Annu. Rev. Genet. 2015. 49:213 42 doi: 10.1146/annurev-genet-120213-092023 A Uniform System for the Annotation of Vertebrate microrna Genes and the Evolution of the Human micrornaome

More information

RNA Labeling Kit. User Manual

RNA Labeling Kit. User Manual RNA Labeling Kit User Manual RNA Labeling Kit The RNA Labeling Kit contains reagents to perform 10 transcription reactions (50 µl each) and 12 independent labeling reactions. Introduction and product description:

More information

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance OECD QSAR Toolbox v.4.1 Tutorial on how to predict Skin sensitization potential taking into account alert performance Outlook Background Objectives Specific Aims Read across and analogue approach The exercise

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Accountability. User Guide

Accountability. User Guide Accountability User Guide The information in this document is subject to change without notice and does not represent a commitment on the part of Horizon. The software described in this document is furnished

More information

The Research Plan. Functional Genomics Research Stream. Transcription Factors. Tuning In Is A Good Idea

The Research Plan. Functional Genomics Research Stream. Transcription Factors. Tuning In Is A Good Idea Functional Genomics Research Stream The Research Plan Tuning In Is A Good Idea Research Meeting: March 23, 2010 The Road to Publication Transcription Factors Protein that binds specific DNA sequences controlling

More information

Extrel is widely respected for the quality of mass spectrometer systems that are

Extrel is widely respected for the quality of mass spectrometer systems that are Extrel is widely respected for the quality of mass spectrometer systems that are available to the world's top research scientists. In response to increasing requests for complete turn-key systems built

More information

13.4 Gene Regulation and Expression

13.4 Gene Regulation and Expression 13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.

More information

Alignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017

Alignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017 Alignment-free RNA-seq workflow Charlotte Soneson University of Zurich Brixen 2017 The alignment-based workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The alignment-based

More information

NINE CHOICE SERIAL REACTION TIME TASK

NINE CHOICE SERIAL REACTION TIME TASK instrumentation and software for research NINE CHOICE SERIAL REACTION TIME TASK MED-STATE NOTATION PROCEDURE SOF-700RA-8 USER S MANUAL DOC-025 Rev. 1.3 Copyright 2013 All Rights Reserved MED Associates

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps

More information

Please click the link below to view the YouTube video offering guidance to purchasers:

Please click the link below to view the YouTube video offering guidance to purchasers: Guide Contents: Video Guide What is Quick Quote? Quick Quote Access Levels Your Quick Quote Control Panel How do I create a Quick Quote? How do I Distribute a Quick Quote? How do I Add Suppliers to a Quick

More information

Firefly Luciferase 1. ATP + Luciferin AMP + Oxyluciferin + Light (565 nm)

Firefly Luciferase 1. ATP + Luciferin AMP + Oxyluciferin + Light (565 nm) Berthold Detection Systems GmbH Bleichstrasse 56 68 D-75173 Pforzheim/Germany Phone: +49(0)7231/9206-0 Fax: +49(0)7231/9206-50 E-Mail: contact@berthold-ds.com Internet: www.berthold-ds.com Dual Luciferase

More information

Ribo-Zero Magnetic Kit*

Ribo-Zero Magnetic Kit* Ribo-Zero Magnetic Kit* (Bacteria) Cat. No. MRZMB126 (Contains 1 box of Cat. No. RZMB11086 and 1 box of Cat. No. MRZ116C) Cat. No. MRZB12424 24 Reactions (Contains 1 box of Cat. No. RZMB12324 and 1 box

More information

DEVELOP YOUR LAB, YOUR WAY

DEVELOP YOUR LAB, YOUR WAY Technical brochure DEVELOP YOUR LAB, YOUR WAY The FLOW Solution: Expand your potential today Redesign your lab with the FLOW Solution The FLOW Solution is a highly flexible modular, semi-automated data

More information

RNA- seq read mapping

RNA- seq read mapping RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference

More information