RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford
|
|
- Letitia Potter
- 5 years ago
- Views:
Transcription
1 RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford
2 RNAseq Protocols } Next generation sequencing protocol } cdna, not RNA sequencing } Types of libraries available: } Total RNA sequencing } polya+ RNA sequencing } Small RNA sequencing } Special protocols: } DSN treatment } Ribominus } SMARTer: Ultra Low RNA sequencing protocol } Strand-specific sequencing } Sequencing only + or strand } Mostly paired-end
3 Genome Study Applications } transcriptome analysis } identifying new transcribed regions } expression profiling } alternative splicing studies } resequencing to find genetic polymorphisms: } SNPs, micro-indels } CNVs
4 cdna Synthesis
5 Arrays vs RNAseq (1) } Correlation of fold change between arrays and RNAseq is similar to correlation between array platforms (0.73) } Technical replicates are almost identical, no need to run } Extra analysis: prediction of alternative splicing, SNPs } Low- and high-expressed genes do not match
6 Array vs RNAseq (2)
7 Data processing and analysis } Alignment } Splice-aware } Reads counting/preprocessing } Adaptor trimming } Counting } Overlapping genes } Strand specific sequencing protocols } Sanity checks } Expression studies } Differential expression } Alternative splicing } GO and pathway analysis
8 Dataflow and Formats Illumina Pipeline (FASTQ) Alignment (BAM) Preprocessing (FASTQ/ FASTA) Expression profiles/ RNA abundance (BED,GTF) Splice variants (GTF) SNP analysis (VCF)
9 Software } } } } } } Short reads aligners } TopHat, STAR Data preprocessing (reads statistics, adapter clipping, formats conversion, read counters) } Fastx toolkit } Htseq } samtools Expression studies } Cufflinks, cuffdiff, cuffcompare } RSEQtools } R packages (DESeq, edger, bayseq, DEGseq, Genominator) Alternative splicing } Cufflinks } MISO } Augustus Downstream analysis } GOSeq } GOStats } SPIA Commercial software } Partek } CLCBio
10 RNASeq alignment } TopHat } University of Maryland ( manual.shtml) } Python wrapper around bowtie aligner } Identifies exons without reference database } Assisted or de novo transcripts assembly } STAR } CSHL ( } Used by ENCODE project as RNASeq aligner } Unbiased detection of splice junctions } Arbitrary large intron length } Heuristic non-exhaustive algorithm
11 FASTQ: Sequence Data } FASTA with Qualities } PHREQ quality score (probability that the corresponding base call is incorrect) with +33 or +64 offset, recorded as an ASCII GGGGGGAAGTCGGCAAAATAGATCCGTAACTTCGGG! +HWI-EAS225:3:1:2:854#0/1! a`abbbbabaabbababb^`[aaa`_n]b^ab^``a!
12 SAM(BAM): Alignment Data Read ID Bitwise Insert flag Chr Pos MapQ CIGAR Mate ref Mate pos size Sequence Scores Extra tags S35_42763_ 4 0 X M * 0 0 CACACGATTCTCAAAGGT IIIIIIIIIIIIIIIIII XA:i:0
13 Statistics and Algorithms } Аim: to detect changes between experimental conditions of interest that are significantly larger than the technical and biological variability among replicates. } Short reads distribution } Poisson } Negative binomial } Normal } Expression values normalization } FPKM } Normalized reads number } VST (variance stabilized transformation)
14 FPKM (RPKM): Expression Values } Fragments Reads Per Kilobase of exon model per Million mapped fragments } Nat Methods. 2008, Mapping and quantifying mammalian transcriptomes by RNA-Seq. Mortazavi A et al. FPKM =10 9 " C NL C= the number of reads mapped onto the gene's exons N= total number of reads in the experiment L= the sum of the exons in base pairs.
15 Read counts } HTSeq-count } } Python script producing raw read counts using sorted sam files } BEDTools } } coveragebed computes both the depth and breadth of coverage of features in file A across the features in file B.
16 Sanity checks } Read counts by category } Counts distribution } Pairwise correlation Normalised count distributions Number of reads (millions) Read Assigment by Category alignment_not_unique ambiguous no_feature not_aligned too_low_aqual Ensembl genes Density WTCHG_52442_273 WTCHG_52442_274 WTCHG_52442_275 WTCHG_52442_276 WTCHG_52442_277 WTCHG_52442_ G_52442_273 G_52442_274 G_52442_275 G_52442_276 G_52442_277 G_52442_288 Log2 normalised counts
17 Cufflinks package } } Cufflinks is a program that assembles aligned RNA-Seq reads into transcripts, estimates their abundances, and tests for differential expression and regulation transcriptome-wide } Cuffcompare: } Transcripts comparison (de novo/genome annotation) } Cuffdiff: } Differential expression analysis
18 Cufflinks (Expression analysis) gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK
19 Cuffdiff (differential expression) } Pairwise or time series comparison } Normal distribution of read counts } Fisher s test test_id gene locus sample_1 sample_2 status value_1 value_2 ln(fold_change) test_stat p_value significant ENSG TSPAN6 chrx: q1 q2 NOTEST no ENSG TNMD chrx: q1 q2 NOTEST no ENSG DPM1 chr20: q1 q2 NOTEST no ENSG SCYL3 chr1: q1 q2 OK yes
20 R/bioconductor Packages } Based on raw read counts per gene/transcript/genome feature (mirna) } DESeq } } Negative binomial distribution } bayseq } bayseq.html } Bayesian approach } Choice of Poisson and negative binomial distribution } edger } DEGSeq } Genominator
21 DESeq: Noise and Variance estimation squared coefficient of variation e-01 1e+01 1e+03 1e+05 base mean B IFN M NK base mean density SCV: the ratio of the variance at base level to the square of the base mean Solid line: biological replicates noise Dotted line: full variance scaled by size factors Shot noise: dotted minus solid
22 DESeq: Differential Expression e-17 ENSG e-13 ENSG e-33 ENSG e-07 ENSG e-05 ENSG e-13 ENSG ENSG e-10 ENSG e-16 ENSG e-30 ENSG e-08 ENSG e-08 ENSG e-14 ENSG e-40 ENSG e-10 ENSG e-33 ENSG e-07 ENSG e-06 ENSG e-11 ENSG e-06 ENSG e-05 ENSG e-07 ENSG e-12 ENSG e-12 ENSG e-18 ENSG e ENSG e-133 res_m_i$log2foldchange id B cells IFG expressio expressio log2foldch n n ange pvalue 1e-01 1e+01 1e+03 res_m_i$basemean 1e+05
23 Alternative splicing analysis } Cufflinks } MISO ( } probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples } DEXSeq ( DEXSeq.html) } differential exon usage
24 Cufflinks: Alternative splicing trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK
25 DEXSeq } The statistical model is based on generalised linear models of the Negative Binomial family (NB- GLMs) } Exon-oriented read counts
26 Visualization: Genome Viewers } Visualize reads alignment and analysis results } Manual check of computational predictions: expression levels, alternative splicing, variants } Track-based visual presentation of data } Custom tracks upload: BAM, BED, BigWig, GTF } Web based: } Gbrowse ( } UCSC Genome Browser } Standalone } Integrated Genome Viewer ( software/igv/)
27 UCSC Genome Browser } Scale chr21: RefSeq Genes Sequences SNPs Human mrnas Spliced ESTs 100 _ Layered H3K27Ac DNase Clusters Txn Factor ChIP 4 _ Mammal Cons BC _ _ Rhesus Mouse Dog Elephant Opossum Chicken X_tropicalis Zebrafish Common SNPs(137) RepeatMasker SOD1 2 kb hg19 33,033,000 33,034,000 33,035,000 33,036,000 33,037,000 33,038,000 33,039,000 33,040,000 33,041,000 UCSC Genes (RefSeq, UniProt, CCDS, Rfam, trnas & Comparative Genomics) RefSeq Genes Publications: Sequences in scientific articles Human mrnas from GenBank Human ESTs That Have Been Spliced H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE Transcription Factor ChIP-seq from ENCODE Placental Mammal Basewise Conservation by PhyloP Multiz Alignments of 46 Vertebrates Simple Nucleotide Polymorphisms (dbsnp 137) Found in >= 1% of Samples Repeating Elements by RepeatMasker
28 IGV: Differential Expression Visualization
29 Downstream analysis and bias corrections } Bias correction } RNASeqBias ( RNAseqRPackage/) } Gene length bias } GC content bias } Dinucleotide bias } GO enrichment } GOStats ( } Initially a microarray package, but can be used in RNASeq } GOSeq ( } Detects Gene Ontology and/or other user defined categories which are over/ under represented in RNA-seq data
30 Pathway analysis } SPIA ( SPIA.html) } Signaling Pathway Impact Analysis (SPIA) uses the information form a list of differentially expressed genes and their log fold changes together with signaling pathways topology, in order to identify the pathways most relevant to the condition under the study } KEGG pathways database } Human and mouse only
31 Part II: Practical demonstration } The aim of this demo is to use DESeq package for RNAseq data analysis. The dataset prodcued by a gene expression study in different types of immune cell, namely B-cells and monocytes. We have a total of 8 samples, 4 from B-cells and 4 from monocytes. } Prerequisites: } R (version > ) } Bioconductor } DESeq
32 Input data } Raw read counts prepared with htseq-count gene!075_b_cell!083_b_cell!088_b_cell!085_b_cell!085_monocyte!075_monocyte!083_monocyte!088_monocyte! ENSG !0!0!0!0!0!0!1!0! ENSG !23!12!9!12!14!4!14!12! ENSG !48!26!10!17!19!5!8!12!
33 Read and normalize data } countstable <-read.delim ("raw_counts.txt",header=true,stringsasfactors=true) } rownames( countstable ) <- countstable$gene } countstable <- countstable[, -1 ] } The next step is to create conditions vector to attribute each column to a given cell type, B for B-cells and M for monocytes: } conds <- c(rep("b",4), rep("m",4)) } Then we create main dataframe for the count data set using function newcountdataset: } cds <- newcountdataset( countstable, conds ) } And normalize the number of read counts: } cds <- estimatesizefactors(cds)
34 Estimate variance and dispersion } Finally, we estimate variance functions for the dataset: } cds <- estimatedispersions(cds, method="per-condition", sharingmode="maximum") } Now we find genes, which are differentially expressed between the two different cell types using negative binomial distribution test: } res <- nbinomtest(cds, "B", "M") } Now we plot MA diagram to estimate expression values and fold changes. Also we put a threshold for the adjusted p-value (padj field in res) as to estimate visually a scale of the differential expression: } plot( res$basemean, res$log2foldchange, log="x", pch=20, cex=.1, col = ifelse( res$padj <.0001, "red", "black" ) )
35 Significant genes } Finally, we filter out the genes with padj > and create the subset of the results for differentially expressed ones: } sig <- res[ res$padj <.001, ] } sig <- sig[ is.na(sig$pval)!= "TRUE", ] } head(sig[with(sig, order(padj)), ]) } Select 50 most significant genes for functional annotation analysis: } noquote(head(sig[with(sig, order(padj)), ]$id, 50))
36 Downstream analysis } The Database for Annotation, Visualization and Integrated Discovery (DAVID )is a powerful resource for functional annotation analysis } We are going to use it to check if there are any important functional categories describing the differentially expressed genes we detected. }
RNA- seq read mapping
RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference
More informationHigh-throughput sequencing: Alignment and related topic
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg HTS Platforms E s ta b lis h e d p la tfo rm s Illu m in a H is e q, A B I S O L id, R o c h e 4 5 4 N e w c o m e rs
More informationIsoform discovery and quantification from RNA-Seq data
Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationOur typical RNA quantification pipeline
RNA-Seq primer Our typical RNA quantification pipeline Upload your sequence data (fastq) Align to the ribosome (Bow>e) Align remaining reads to genome (TopHat) or transcriptome (RSEM) Make report of quality
More informationComparative analysis of RNA- Seq data with DESeq2
Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given
More informationDifferential expression analysis for sequencing count data. Simon Anders
Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19
More informationRNASeq Differential Expression
12/06/2014 RNASeq Differential Expression Le Corguillé v1.01 1 Introduction RNASeq No previous genomic sequence information is needed In RNA-seq the expression signal of a transcript is limited by the
More informationStatistical Inferences for Isoform Expression in RNA-Seq
Statistical Inferences for Isoform Expression in RNA-Seq Hui Jiang and Wing Hung Wong February 25, 2009 Abstract The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription
More informationAnnotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)
Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction
More informationTechnologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA
Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem
More informationIntroduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas
Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationStatistics for Differential Expression in Sequencing Studies. Naomi Altman
Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand
More informationGEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier
ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Data visualization, quality control, normalization & peak calling Peak annotation Presentation () Practical session
More informationexpress: Streaming read deconvolution and abundance estimation applied to RNA-Seq
express: Streaming read deconvolution and abundance estimation applied to RNA-Seq Adam Roberts 1 and Lior Pachter 1,2 1 Department of Computer Science, 2 Departments of Mathematics and Molecular & Cell
More informationChIP seq peak calling. Statistical integration between ChIP seq and RNA seq
Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription
More informationSupplemental Information
Molecular Cell, Volume 52 Supplemental Information The Translational Landscape of the Mammalian Cell Cycle Craig R. Stumpf, Melissa V. Moreno, Adam B. Olshen, Barry S. Taylor, and Davide Ruggero Supplemental
More informationSynteny Portal Documentation
Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,
More informationChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier
ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Visualization, quality, normalization & peak-calling Presentation (Carl Herrmann) Practical session Peak annotation
More informationBias in RNA sequencing and what to do about it
Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA
More informationIntroduction. SAMStat. QualiMap. Conclusions
Introduction SAMStat QualiMap Conclusions Introduction SAMStat QualiMap Conclusions Where are we? Why QC on mapped sequences Acknowledgment: Fernando García Alcalde The reads may look OK in QC analyses
More informationNew RNA-seq workflows. Charlotte Soneson University of Zurich Brixen 2016
New RNA-seq workflows Charlotte Soneson University of Zurich Brixen 2016 Wikipedia The traditional workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The traditional workflow
More informationThe official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook
Stony Brook University The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook University. Alll Rigghht tss
More informationCOLE TRAPNELL, BRIAN A WILLIAMS, GEO PERTEA, ALI MORTAZAVI, GORDON KWAN, MARIJKE J VAN BAREN, STEVEN L SALZBERG, BARBARA J WOLD, AND LIOR PACHTER
SUPPLEMENTARY METHODS FOR THE PAPER TRANSCRIPT ASSEMBLY AND QUANTIFICATION BY RNA-SEQ REVEALS UNANNOTATED TRANSCRIPTS AND ISOFORM SWITCHING DURING CELL DIFFERENTIATION COLE TRAPNELL, BRIAN A WILLIAMS,
More informationDEXSeq paper discussion
DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 8 29, pages 126 132 doi:1.193/bioinformatics/btp113 Gene expression Statistical inferences for isoform expression in RNA-Seq Hui Jiang 1 and Wing Hung Wong 2,
More informationGoing Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014
Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more
More informationSupplementary Information. Characteristics of Long Non-coding RNAs in the Brown Norway Rat and. Alterations in the Dahl Salt-Sensitive Rat
Supplementary Information Characteristics of Long Non-coding RNAs in the Brown Norway Rat and Alterations in the Dahl Salt-Sensitive Rat Feng Wang 1,2,3,*, Liping Li 5,*, Haiming Xu 5, Yong Liu 2,3, Chun
More informationAndrogen-independent prostate cancer
The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)
More information*Equal contribution Contact: (TT) 1 Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv
Supplementary of Complementary Post Transcriptional Regulatory Information is Detected by PUNCH-P and Ribosome Profiling Hadas Zur*,1, Ranen Aviner*,2, Tamir Tuller 1,3 1 Department of Biomedical Engineering,
More informationAnalyses biostatistiques de données RNA-seq
Analyses biostatistiques de données RNA-seq Ignacio Gonzàlez, Annick Moisan & Nathalie Villa-Vialaneix prenom.nom@toulouse.inra.fr Toulouse, 18/19 mai 2017 IG, AM, NV 2 (INRA) Biostatistique RNA-seq Toulouse,
More informationAlignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017
Alignment-free RNA-seq workflow Charlotte Soneson University of Zurich Brixen 2017 The alignment-based workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The alignment-based
More informationStatistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology
Purdue University Purdue e-pubs Open Access Dissertations Theses and Dissertations Fall 2013 Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology Han Wu
More informationBrowsing Genomic Information with Ensembl Plants
Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial
More informationThe Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector.
The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. Omar S. Akbari*, Igor Antoshechkin*, Henry Amrhein, Brian Williams, Race Diloreto, Jeremy
More informationABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences
ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences Wentao Yang October 30, 2018 1 Introduction This vignette is intended to give a brief introduction of the ABSSeq
More informationNormalization and differential analysis of RNA-seq data
Normalization and differential analysis of RNA-seq data Nathalie Villa-Vialaneix INRA, Toulouse, MIAT (Mathématiques et Informatique Appliquées de Toulouse) nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org
More informationg A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(
,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationRNA-seq. Differential analysis
RNA-seq Differential analysis DESeq2 DESeq2 http://bioconductor.org/packages/release/bioc/vignettes/deseq 2/inst/doc/DESeq2.html Input data Why un-normalized counts? As input, the DESeq2 package expects
More informationVariant visualisation and quality control
Variant visualisation and quality control You really should be making plots! 25/06/14 Paul Theodor Pyl 1 Classical Sequencing Example DNA.BAM.VCF Aligner Variant Caller A single sample sequencing run 25/06/14
More informationIntroduction to de novo RNA-seq assembly
Introduction to de novo RNA-seq assembly Introduction Ideal day for a molecular biologist Ideal Sequencer Any type of biological material Genetic material with high quality and yield Cutting-Edge Technologies
More informationPackage NarrowPeaks. September 24, 2012
Package NarrowPeaks September 24, 2012 Version 1.0.1 Date 2012-03-15 Type Package Title Functional Principal Component Analysis to Narrow Down Transcription Factor Binding Site Candidates Author Pedro
More informationStatistical tests for differential expression in count data (1)
Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image
More informationBioinformatics Practical for Biochemists
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2012/2013 01. DNA & Genomics 1 Description Lectures about general topics in Bioinformatics & History Tutorials will
More informationHigh-throughput sequence alignment. November 9, 2017
High-throughput sequence alignment November 9, 2017 a little history human genome project #1 (many U.S. government agencies and large institute) started October 1, 1990. Goal: 10x coverage of human genome,
More informationBioinformatics Practical for Biochemists
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2013/2014 01. DNA & Genomics!! 1 Description Lectures about general topics in Bioinformatics & History Tutorials
More informationExploring variability in reads from next generation rna-sequencing data. Presented By. Andrew Butler
Exploring variability in reads from next generation rna-sequencing data Presented By Andrew Butler in partial fullfillment of the requirements for graduation with a dean s scholars honors degree in biology
More informationGenome Annotation. Qi Sun Bioinformatics Facility Cornell University
Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST
More informationGenome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv: v2 [q-bio.
Genome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv:153.181v2 [q-bio.gn] 16 Jul 215 Antti Honkela 1, Jaakko Peltonen 2,3, Hande Topa 2, Iryna Charapitsa 4, Filomena
More informationIdentification of 3 0 gene ends using transcriptional and genomic conservation across vertebrates
Morgan et al. BMC Genomics 2012, 13:708 METHODOLOGY ARTICLE Open Access Identification of 3 0 gene ends using transcriptional and genomic conservation across vertebrates Marcos Morgan 1,2*, Alessandra
More informationSupplementary Information for Discovery and characterization of indel and point mutations
Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed
More informationComparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis
Title Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Author list Yu Han 1, Huihua Wan 1, Tangren Cheng 1, Jia Wang 1, Weiru Yang 1, Huitang Pan 1* & Qixiang
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationPredictive Genome Analysis Using Partial DNA Sequencing Data
Predictive Genome Analysis Using Partial DNA Sequencing Data Nauman Ahmed, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University of Technology, Delft, The Netherlands {n.ahmed, k.l.m.bertels,
More informationSupplementary Figure 1 The number of differentially expressed genes for uniparental males (green), uniparental females (yellow), biparental males
Supplementary Figure 1 The number of differentially expressed genes for males (green), females (yellow), males (red), and females (blue) in caring vs. control comparisons in the caring gene set and the
More informationAraport, a community portal for Arabidopsis. Data integration, sharing and reuse. sergio contrino University of Cambridge
Araport, a community portal for Arabidopsis. Data integration, sharing and reuse sergio contrino University of Cambridge Acknowledgements J Craig Venter Institute Chris Town Agnes Chan Vivek Krishnakumar
More informationCount ratio model reveals bias affecting NGS fold changes
Published online 8 July 2015 Nucleic Acids Research, 2015, Vol. 43, No. 20 e136 doi: 10.1093/nar/gkv696 Count ratio model reveals bias affecting NGS fold changes Florian Erhard * and Ralf Zimmer Institut
More informationDifferential analyses for RNA-seq: transcript-level estimates improve gene-level inferences Supplementary Material
Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences Supplementary Material Charlotte Soneson, Michael I. Love, Mark D. Robinson Contents 1 Simulation details, sim2
More informationCorrespondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons
Gao and Li BMC Genomics (2017) 18:234 DOI 10.1186/s12864-017-3600-2 RESEARCH ARTICLE Open Access Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics
More informationDispersion modeling for RNAseq differential analysis
Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationAnnotation of Drosophila grimashawi Contig12
Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:
More informationHapsembler version 2.1 ( + Encore & Scarpa) Manual. Nilgun Donmez Department of Computer Science University of Toronto
Hapsembler version 2.1 ( + Encore & Scarpa) Manual Nilgun Donmez Department of Computer Science University of Toronto January 13, 2013 Contents 1 Introduction.................................. 2 2 Installation..................................
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationDavid M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis
David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis March 18, 2016 UVA Seminar RNA Seq 1 RNA Seq Gene expression is the transcription of the
More informationSUSTAINABLE AND INTEGRAL EXPLOITATION OF AGAVE
SUSTAINABLE AND INTEGRAL EXPLOITATION OF AGAVE Editor Antonia Gutiérrez-Mora Compilers Benjamín Rodríguez-Garay Silvia Maribel Contreras-Ramos Manuel Reinhart Kirchmayr Marisela González-Ávila Index 1.
More informationExplore SNP polymorphism data. A. Dereeper, Y. Hueber
Explore SNP polymorphism data A. Dereeper, Y. Hueber Bioinformatics trainings, Supagro, February, 2016 Tablet Graphical tool to visualize assemblies Accept many formats ACE, SAM, BAM GATK (Genome Analysis
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationNetwork Biology-part II
Network Biology-part II Jun Zhu, Ph. D. Professor of Genomics and Genetic Sciences Icahn Institute of Genomics and Multi-scale Biology The Tisch Cancer Institute Icahn Medical School at Mount Sinai New
More informationPrinciples of Long Noncoding RNA Evolution Derived from Direct Comparison of Transcriptomes in 17 Species
Resource Principles of Long Noncoding RNA Evolution Derived from Direct Comparison of Transcriptomes in 17 Species Graphical Abstract Authors Hadas Hezroni, David Koppstein,..., David P. Bartel, Igor Ulitsky
More informationSingle Cell Sequencing
Single Cell Sequencing Fundamental unit of life Autonomous and unique Interactive Dynamic - change over time Evolution occurs on the cellular level Robert Hooke s drawing of cork cells, 1665 Type Prokaryotes
More informationGenomic expression catalogue of a global collection of BCG vaccine strains. show evidence for highly diverged metabolic and cell-wall adaptations.
Genomic expression catalogue of a global collection of BCG vaccine strains show evidence for highly diverged metabolic and cell-wall adaptations. Abdallah M. Abdallah 1 *, Grant A. Hill-Cawthorne 1,2,
More informationSystematic comparison of lncrnas with protein coding mrnas in population expression and their response to environmental change
Xu et al. BMC Plant Biology (2017) 17:42 DOI 10.1186/s12870-017-0984-8 RESEARCH ARTICLE Open Access Systematic comparison of lncrnas with protein coding mrnas in population expression and their response
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationGBS Bioinformatics Pipeline(s) Overview
GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Terry Casstevens With supporting information from
More informationOverview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database
Overview - MS Proteomics in One Slide Obtain protein Digest into peptides Acquire spectra in mass spectrometer MS masses of peptides MS/MS fragments of a peptide Results! Match to sequence database 2 But
More informationTRANSCRIPTOMICS. (or the analysis of the transcriptome) Mario Cáceres. Main objectives of genomics. Determine the entire DNA sequence of an organism
TRANSCRIPTOMICS (or the analysis of the transcriptome) Mario Cáceres Main objectives of genomics Determine the entire DNA sequence of an organism Identify and annotate the complete set of genes encoded
More informationTesting High-Dimensional Count (RNA-Seq) Data for Differential Expression
Testing High-Dimensional Count (RNA-Seq) Data for Differential Expression Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 6 1 References Anders & Huber (2010), Differential
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationPackage chimeraviz. November 29, 2017
Type Package Title Visualization tools for gene fusions Version 1.4.0 Package chimeraviz November 29, 2017 chimeraviz manages data from fusion gene finders and provides useful visualization tools. License
More informationGBS Bioinformatics Pipeline(s) Overview
GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Rob Elshire With supporting information from the
More informationCycle «Analyse de données de séquençage à haut-débit»
Cycle «Analyse de données de séquençage à haut-débit» Module 1/5 Analyse ADN Chadi Saad CRIStAL - Équipe BONSAI - Univ Lille, CNRS, INRIA (chadi.saad@univ-lille.fr) Présentation de Sophie Gallina (source:
More informationPG Diploma in Genome Informatics onwards CCII Page 1 of 6
PG Diploma in Genome Informatics 2014-15 onwards CCII Page 1 of 6 BHARATHIAR UNIVERSITY, COIMBATORE 641046 CENTRE FOR COLLABORATION OF INDUSTRY AND INSTITUTION(CCII) PG DIPLOMA IN GENOME INFORMATICS (For
More informationStatistical Methods for Functional Genomics Studies Using Observational Data
Statistical Methods for Functional Genomics Studies Using Observational Data Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationUnit-free and robust detection of differential expression from RNA-Seq data
Unit-free and robust detection of differential expression from RNA-Seq data arxiv:405.4538v [stat.me] 8 May 204 Hui Jiang,2,* Department of Biostatistics, University of Michigan 2 Center for Computational
More informationGene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji
Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationEBSeq: An R package for differential expression analysis using RNA-seq data
EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John Dawson, and Christina Kendziorski October 14, 2013 Contents 1 Introduction 2 2 Citing this software 2 3 The Model
More informationDaphnia magna. Genetic and plastic responses in
Genetic and plastic responses in Daphnia magna Comparison of clonal differences and environmental stress induced changes in alternative splicing and gene expression. Jouni Kvist Institute of Biotechnology,
More informationTaxonomical Classification using:
Taxonomical Classification using: Extracting ecological signal from noise: introduction to tools for the analysis of NGS data from microbial communities Bergen, April 19-20 2012 INTRODUCTION Taxonomical
More informationDeciphering regulatory networks by promoter sequence analysis
Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca
More informationSPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1
SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and
More informationMixtures and Hidden Markov Models for analyzing genomic data
Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche
More information