1 Decomposition of ESG

Size: px
Start display at page:

Download "1 Decomposition of ESG"

Transcription

1 1 Decomposition of ESG DiffSplice resolves alternative splicing events in complex gene models through decomposition of the splice graph. Figure 1 shows the hierarchical decomposition on gene VEGFA. In total, 6 ASMs result from the decomposition. VEGFA E1 E2 E3 E7 E9 E10-12 E13-14 E4-6 E8 E15 E16 E3 ASM1.1 E7 E7 ASM1.2 E16 level1 Decomposition p3 E4 E5 E5 E6 E8 p3 E9 ASM3.1 ASM2 E13 ASM3.2 level2 level3 E10 ASM4 E14 p3 E15 level4 p3 level5 E11-12 Figure 1: Gene model and decomposition of gene VEGFA. 1

2 Following is the pseudo-code for the algorithm to decompose an ESG. input : G =< V, E, ts, te, w > output: E max E max any edge {e E}; for all e 1 = (u 1, v 1 ) E do for all e 2 = (u 2, v 2 ) E max do if there is a path from u 1 to u 2 and a path from v 2 to v 1 then E max E max \{e 2 } {e 1 }; end end end Algorithm 1: Find maximal edges in an ESG G (CalculateMaximalEdges(G)) input : An ESG G =< V, E, ts, te, w >, parent P output: The set of ASMs A Calculate pre-dominators in G; Calculate post-dominators in G; Candidate entry {u : d + (u) > 1}; Candidate exit {v : d + (v) > 1}; for all u Candidate entry do v the immediate post-dominator of u; if v Candidate exit and u is the immediate pre-dominator of v then parent(h(u, v)) P ; A A H(u, v); E max CalculateMaximalEdges(H(u, v)); Decompose(H(u, v)\e max, H(u, v)); end end Algorithm 2: Find the ASMs in an ESG G (Decompose(G, P )) 2

3 2 Abundance estimation in ASM Consider an ASM with n alternative transcription paths and m features (exonic segments and splice junctions). We define A t,e as an indicator for the presence of a feature e in transcription path t, with value of 1 if t covers e and 0 otherwise. The indicators for the presence of every exon/junction in each path form an n m indicator matrix A. 2.1 Derivation of likelihood function Let C e t denote the coverage on the eth feature from the tth path. Under the independence assumption, the likelihood can be factorized as L(q, N C 1,, C m ) = P (C 1 1,, C 1 n, C 2 1,, C 2 n,, C m 1,, C m n q, N) n = P (C 1 t, C 2 t,, C m t ) = = = t=1 n P (C 1 t, C 2 t,, C m t N t )P (N t ) t=1 n t=1 i=1 n t=1 i=1 m P (C i t N t )P (N t ) m f(c i t N t )g(n t ), where f( ) is the density of N(C t, r(lt le)ct l tl e ) and g( ) is the density of P oisson(λ t ), λ t = N p t. 2.2 Maximum likelihood estimators The maximum likelihood estimator for q and N are the ones that maximize the likelihood, (ˆq, ˆN) = arg max L(q, N data). q,n l(q, N C 1,, C m ) = log L(q, N C 1,, C m ) n m = [log(g(n t )) + log f(c i t N t )] = = t=1 i=1 n λ m {log e λt Nt (C t 1 i t C t ) 2 + log[ N t! 2πr(lt l i=1 i )C t /(l t l i ) e 2r(l t le)c t /(l t le) ]} n m { λ t + N t log λ t log N t! + [ 1 2 log l t log l i 1 2 log 2π 1 2 log r t=1 t=1 1 2 log(l t l i ) 1 2 log C t i=1 (C i t C t ) 2 2r(l t l e )C t /(l t l e ) ]} 3

4 2.3 EM algorithm for deriving estimators The expectation maximization (EM) algorithm to find the maximum likelihood estimator ˆq and ˆN is detailed as the following. 1. E-step: Denoting the values of q t at step v as q (v) t, we first calculate the conditional expectation of C t conditioning on q (v) t. Let C (1), C (2),, C (m ) be the read coverage of the exonic segments that are in path t, i.e., A t,e = 1 if e {(1), (2),, (m )} and A t,e = 0 otherwise. Let Cˆ e t denote the expected coverage on exonic segment e from t, Cˆ e t = peq(v) t A t,e n C e. Let k t,e denote r(lt le) j=1 peq(v) j A j,e l tl e, so we have C e t N(C t, k t,e C t ). Therefore, the conditional expectation of C t is the maximum likelihood estimator that maximizes the joint density of the m normal densities, [ m Ct C (1), C (2),, C (m )] + m m m 2 i=1 k 1 t,(i) i=1 k 1 ˆ t,(i) C (i) t = Ĉ t = E q (v) t 2 m i=1 k 1 t,(i) The expected number of reads on path t is hence calculated as ˆN t = Ĉtlt r. 2. M-step: Then we derive the parameters that maximize the conditional likelihood on ˆN t : Set L N to 0 n N t ˆN = 0 ˆN = t=1 n t=1 ˆN t Set L to 0 q t n ( dλ t + N t 1 dˆq t=1 t λ dλ ) t = 0 dˆq t n ( ( N t N 1) dλ ) t = 0 dˆq t=1 t ˆN t n ( j=1,j t ˆq (v) t = ˆq (v 1) j ( ) m i=1 p ia j,i ) ( ˆN ˆN t ) ( m i=1 p ia t,i ) 4

5 3 Statistical test for differential transcription Jensen-Shannon divergence. Let p = (p 1,, p t ) T and q = (q 1,, q t ) T be two t-dimensional distributions. The Jensen-Shannon divergence (JSD) is calculated as JSD(p q) = (KLD(p µ) + KLD(q µ))/2, where KLD(p q) = t j=1 p j log p j q j and µ = (p + q)/2. 5

6 4 Biological meanings and applications of ASM Here we give three examples to demonstrate that the investigation of ASMs may reveal functional sequences. The first two examples (ERBB4 and VEGFA) show significant sequences residing in single ASMs, while the third example (CD44) show an isoform transition associated with multiple ASMs. In Figure 2 we plot the ASM in gene ERBB4. ASM 1 indicates an exon skipping event that alternatively includes or excludes exon E 3. The skipping path (p 2 ), which corresponds to the CYT-2 isoform in ERBB4, deletes a WW binding motif, leading to increased cell proliferation. [3] ERBB4 E1 E3 E4E6 E8 E2 E5 E7 E9 E11 E13 E15 E17 E19 E21 E23 E24 E25 E26 E27 E28 E10 E12 E14 E16 E18 E20 E22 Decomposition E2 ASM1 E4 E3 level1 level2 Figure 2: The splice graph and the ASM decomposition of gene ERBB4. We take gene VEGFA as another example which has 6 ASMs with complex nesting structure. Bainbridge et al. have identified a 7-amino acid peptide, RKRKKSR, encoded by exon E 10. [1] This peptide could inhibit VEGF receptor binding and angiogenesis in vitro. In Figure 1 we show the ASMs in gene VEGFA. ASM3.1 captures the alternative inclusion/exclusion of E 10. Thus, this ASM shows that some isoforms of VEGFA lack this important peptide sequence. Lastly, we look at two isoforms in gene CD44, CD44s and CD44v. Isoform CD44s includes exons E 1 E 5, E 14 E 17 and E 18, and CD44v includes exons E 1 E 5, E 6 E 13, E 14 E 17 and E 18 (Figure 3). Brown et al. have suggested a shift in CD44 expression from variant isoforms (CD44v) to the standard isoform (CD44s) is essential in epithelial cell development and is associated with breast cancer progression. [2] The alternative exons by which CD44s and CD44v differ, E 6 E 13, are captured by three ASMs ASM 4, ASM 5 and ASM 6, where CD44s takes path p 1 in ASM 4 and CD44v takes path p 2 in all ASM 4, ASM 5 and ASM 6. Therefore, the joint analysis of all the three ASMs will be essential for the study of the isoform transition in this gene. References [1] James Bainbridge, Haiyan Jia, Azadeh Bagherzadeh, David Selwood, Robin Ali, and Ian Zachary. A peptide encoded by exon 6 of vegf (eg3306) inhibits vegf-induced angiogenesis in vitro and ischaemic retinal neovascularisation in vivo. Biochemical and Biophysical Research Communications, 302(4):793 9, [2] Rhonda Brown, Lauren Reinke, Marin Damerow, Denise Perez, Lewis Chodosh, Jing Yang, and Chonghui Cheng. Cd44 splice isoform switching in human and mouse epithelium is essential for epithelial-mesenchymal transition and breast cancer progression. The Journal of Clinical Investigation, 121(3): ,

7 CD44 E1 E2 E3 E4 E5 E7 E9 E11 E13 E15 E17 E19 E6 E8 E10 E12E14 E16 E18 E2 ASM1.1 E17 E17 ASM1.2 level1 E3 E4 E5 ASM2 E16 E18 E19 level2 Decomposition ASM3 ASM4 E14 E15 level3 level4 ASM5 E12E13 level5 ASM6 E7 E9 E11 E8 E10 E6 level6 level7 Figure 3: The splice graph and the ASM decomposition of gene CD44. [3] Rebecca Muraoka-Cook, Melissa Sandahl, Karen Strunk, Leah Miraglia, Carty Husted, Debra Hunter, Klaus Elenius, Lewis Chodosh, and H. Shelton Earp. Erbb4 splice variants cyt1 and cyt2 differ by 16 amino acids and exert opposing effects on the mammary epithelium in vivo. Molecular and Cellular Biology, 29(18): ,

8 5 Simulation datasets 5.1 Gene VEGFA We simulated 100 runs of experiments on this gene. In each run, 2 sets of RNA-seq reads were generated by 2 independently created transcript expression profiles. Every set of reads had 50K 50bp single-end reads. In Figure 4a, every single dot represents an ASM in one run. All ASMs have the divergence estimated by DiffSplice very close to the profile divergence, with a Pearson correlation as high as This precision in quantifying sample-sample divergence results from the accuracy in path abundance estimation. Figure 4b plots the distribution of the MSE between path distribution for every single ASM. All 6 ASMs have the majority of their MSE below with mean close to 0 and small variances, showing the accuracy of the abundance estimator developed in DiffSplice Sqrt of Profile JSD Sqrt of DiffSplice JSD ASM ID MSE between Path Distribution (a) (b) Figure 4: Evaluation of DiffSplice on simulated dataset of gene VEGFA. (a) Comparison between difference calculated from sampling profile and difference estimated by DiffSplice, measured by the square root of JSD. The Pearson correlation is (b) The mean squared error (MSE) between sampling profile and estimated alternative path distribution, averaged between the two samples. The abundance estimation procedure of DiffSplice has very low error on all the 6 ASMs. 5.2 Human transcriptome Following the UCSC human hg19 gene annotation, two sets of RNA-seq reads were generated by sampling from the whole human transcriptome with different transcript expression profiles. Each dataset consisted of 50M 50bp single-end reads. Genes with averaged read coverage per base greater than 10 were picked to compare the difference by profile and the difference derived by DiffSplice. The majority of the points stay close to the diagonal where the DiffSplice JSD and the profile JSD are equal, resulting a correlation of (Figure 5a). The variance of the difference between DiffSplice JSD and profile JSD is larger at ASMs with similar profiles in the two samples (i.e. ASMs with low profile JSD) and 8

9 decreases as ASMs having higher divergence between profiles. This observation follows the nonlinearity of the JSD: compared to the Euclidean distance, the JSD gives larger value than the Euclidean distance for small differences and smaller value for greater differences. The randomness in read sampling procedure may deviate from the profile expression. Therefore the differences measured by JSD might get slightly inflated when the difference is low. However, the MSE in path abundance estimation still mainly stays below 0.01 (Figure 5b). As coverage increases, the deviation between estimated path distribution and profile distribution converges to 0 and the variance also tends to decrease, consistent with an unbiased and asymptotically efficient abundance estimator Sqrt of Profile JSD Sqrt of DiffSplice JSD Expression Level MSE between Path Distribution (a) (b) Figure 5: Evaluation of DiffSplice on simulated dataset of human transcriptome. (a) Comparison between difference calculated from sampling profile and difference estimated by DiffSplice, measured by the square root of JSD. The Pearson correlation is (b) The mean squared error (MSE) between sampling profile and estimated alternative path distribution, averaged between the two samples. ASMs are separated into 10 quantile groups according to their expression level. ASMs with higher expression have less estimation error. 9

10 6 Real datasets 6.1 qrt-pcr validation RNA was isolated from the cell lines using standard Trizol protocol (Invitrogen, Inc.). RNA was reverse transcribed into cdna using an iscript cdna synthesis kit exactly according to manufacturer s instructions (Bio-Rad, Hercules, CA). Expression of target genes TMC5, LMO7, and TBP, a normalizing control, was measured by real-time PCR using 20ng template cdna, forward and reverse primers at a final concentration of 500nM each, and SsoFast EvaGreen Supermix with low ROX (Bio-Rad, Hercules, CA). The total reaction volume was 20µL. Reactions were run on an Applied Biosystems 7500HT thermocycler under the following conditions: denaturation at 95 C for 30 seconds followed by 40 cycles of denaturation at 95 C for 5 seconds and annealing/extension at 60 C for 30 seconds. Relative expression levels were calculated by the delta-delta Ct method. 7 Relative splice variant expression, Day 3 and Day 35 fold change, relative to TBP D3-IN D3-EX D35-IN D35-EX D3-IN D3-EX D35-IN D35-EX TMC5 LMO7 Figure 6: Relative splice variant expression at day 3 and day 35 from the PCR validation. 10

11 6.2 Lung differentiation dataset Scale chr13: 116 _ 100 kb hg Day 3 Replicate1 Day 3 Replicate1 102 _ Day 3 Replicate2 Day 3 Replicate2 117 _ Day 3 Replicate3 Day 3 Replicate3 146 _ Day 35 Replicate1 Day 35 Replicate1 225 _ Day 35 Replicate2 Day 35 Replicate2 157 _ Day 35 Replicate3 Day 35 Replicate3 Gene ASM1.path1 ASM1.path2 ASM2.path1 ASM2.path2 ASM3.path1 ASM3.path2 ASM4.path1 ASM4.path2 LMO7 LMO7 DiffSplice Splice Graph RefSeq Genes Figure 7: Exon skipping event identified by DiffSplice in gene LMO7. The skipping variant (ASM 2.path1) had significantly higher relative abundance at day 35 (78%) than day 3 (28%), consistent with the result of qrt-pcr experiment. 11

12 Scale chr10: 208 _ 10 kb hg Day 3 Replicate1 Day 3 Replicate1 225 _ Day 3 Replicate2 Day 3 Replicate2 359 _ Day 3 Replicate3 Day 3 Replicate3 40 Day 35 Replicate1 Day 35 Replicate1 369 _ Day 35 Replicate2 Day 35 Replicate2 515 _ Day 35 Replicate3 Day 35 Replicate3 Gene ASM1.path1 ASM1.path2 ASM1.path3 ASM1.path4 ASM1.path5 ASM1.path6 TCONS_ TCONS_ TCONS_ TCONS_ TCONS_ HNRNPF HNRNPF HNRNPF HNRNPF HNRNPF HNRNPF DiffSplice Splice Graph Cufflinks transcripts RefSeq Genes Figure 8: Alternative transcription start sites identified by DiffSplice in gene HNRNPF. DiffSplice correctly reconstructed all 6 alternative transcription start sites in RefSeq annotation and tested the differential transcription in this event as significant change. The alternative path ASM 1.path4 (corresponding to the 5th transcript in RefSeq annotation) had significantly higher expression at day

13 6.3 Breast cancer dataset Figure 9: The Venn-diagram of the differentially transcribed genes called by DiffSplice and FDM on the breast cancer dataset. The number of shared genes is 955, 38.1% of the result of DiffSplice and 45.7% of the result of FDM. 13

14 Scale chr7: 306 _ 20 kb hg19 158,550, ,600,000 MCF7_SM6_HS MCF7_SM6_HS 295 _ MCF7_SM4_HS MCF7_SM4_HS 446 _ MCF7_11_HS MCF7_11_HS 352 _ MCF7_5_HS MCF7_5_HS 421 _ SUM102_12_HS SUM102_12_HS 287 _ SUM102_10_HS SUM102_10_HS 279 _ SUM102_SM6_HS SUM102_SM6_HS 316 _ SUM102_SM7_HS SUM102_SM7_HS Gene ASM1.path1 ASM1.path2 ESYT2 DiffSplice splice graph RefSeq Genes Figure 10: Exon skipping event identified by DiffSplice but not by FDM in gene ESYT2. The skipping variant (ASM 1.path1) had significantly higher relative abundance in the SUM102 group than in the MCF7 group. 14

15 Scale chr10: 1477 _ 50 kb hg19 95,100,000 95,150,000 95,200,000 MCF7_SM6_HS MCF7_SM6_HS 1341 _ MCF7_SM4_HS MCF7_SM4_HS 2513 _ MCF7_11_HS MCF7_11_HS 183 MCF7_5_HS MCF7_5_HS 1078 _ SUM102_12_HS SUM102_12_HS 828 _ SUM102_10_HS SUM102_10_HS 45 SUM102_SM6_HS SUM102_SM6_HS 498 _ SUM102_SM7_HS SUM102_SM7_HS Gene ASM1.path1 ASM1.path2 ASM2.path1 ASM2.path2 ASM3.path1 ASM3.path2 MYOF MYOF DiffSplice splice graph RefSeq Genes Figure 11: Exon skipping event identified by DiffSplice but not by FDM in gene MYOF. Three ASMs were found in this gene. The skipping variant (ASM3.path1) in ASM3 had significantly higher relative abundance in the MCF7 group than in the SUM102 group. 15

16 Scale chr9: 39 _ 50 kb hg19 116,250, ,300, ,350,000 MCF7_SM6_HS MCF7_SM6_HS 35 _ MCF7_SM4_HS MCF7_SM4_HS 51 _ MCF7_11_HS MCF7_11_HS 56 _ MCF7_5_HS MCF7_5_HS 88 _ SUM102_12_HS SUM102_12_HS 81 _ SUM102_10_HS SUM102_10_HS 51 _ SUM102_SM6_HS SUM102_SM6_HS 72 _ SUM102_SM7_HS SUM102_SM7_HS Gene ASM1.path1 ASM1.path2 ASM1.path3 ASM1.path4 ASM2.path1 ASM2.path2 ASM3.path1 ASM3.path2 ASM3.path3 RGS3 RGS3 RGS3 RGS3 RGS3 RGS3 DiffSplice splice graph RefSeq Genes Figure 12: Alternative transcription start sites identified by DiffSplice but not by FDM in gene RGS3. In MCF7 group, the earliest start site (ASM 1.path1) was expressed but the second start site (ASM 1.path2) was barely expressed. In SUM102 group, the earliest start site was barely expressed but the second start site was expressed. 16

17 Scale chr1: 26 5 kb hg19 154,940, ,945,000 MCF7_SM6_HS MCF7_SM6_HS 191 _ MCF7_SM4_HS MCF7_SM4_HS 312 _ MCF7_11_HS MCF7_11_HS 31 MCF7_5_HS MCF7_5_HS 3669 _ SUM102_12_HS SUM102_12_HS 270 SUM102_10_HS SUM102_10_HS 2294 _ SUM102_SM6_HS SUM102_SM6_HS 2629 _ SUM102_SM7_HS SUM102_SM7_HS Gene ASM1.path1 ASM1.path2 ASM2.path1 ASM2.path2 ASM3.path1 ASM3.path2 ASM4.path1 ASM4.path2 ASM4.path3 SHC1 SHC1 SHC1 SHC1 SHC1 DiffSplice splice graph RefSeq Genes Figure 13: Alternative transcription start sites identified by DiffSplice but not by FDM in gene SHC1. In MCF7 group, the start site ASM4.path1 had low expression but the start site ASM4.path2 was highly expressed, as compared to the overall gene expression level. In the SUM102 group, expression switched from ASM 4.path2 to ASM 4.path1. 17

Isoform discovery and quantification from RNA-Seq data

Isoform discovery and quantification from RNA-Seq data Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Statistical Inferences for Isoform Expression in RNA-Seq

Statistical Inferences for Isoform Expression in RNA-Seq Statistical Inferences for Isoform Expression in RNA-Seq Hui Jiang and Wing Hung Wong February 25, 2009 Abstract The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription

More information

Eppendorf twin.tec PCR Plates 96 LoBind Increase Yield of Transcript Species and Number of Reads of NGS Libraries

Eppendorf twin.tec PCR Plates 96 LoBind Increase Yield of Transcript Species and Number of Reads of NGS Libraries APPLICATION NOTE No. 375 I December 2016 Eppendorf twin.tec PCR Plates 96 LoBind Increase Yield of Transcript Species and Number of Reads of NGS Libraries Hanae A. Henke¹, Björn Rotter² ¹Eppendorf AG,

More information

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014 Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 8 29, pages 126 132 doi:1.193/bioinformatics/btp113 Gene expression Statistical inferences for isoform expression in RNA-Seq Hui Jiang 1 and Wing Hung Wong 2,

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction

More information

Translation Part 2 of Protein Synthesis

Translation Part 2 of Protein Synthesis Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Bias in RNA sequencing and what to do about it

Bias in RNA sequencing and what to do about it Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

COLE TRAPNELL, BRIAN A WILLIAMS, GEO PERTEA, ALI MORTAZAVI, GORDON KWAN, MARIJKE J VAN BAREN, STEVEN L SALZBERG, BARBARA J WOLD, AND LIOR PACHTER

COLE TRAPNELL, BRIAN A WILLIAMS, GEO PERTEA, ALI MORTAZAVI, GORDON KWAN, MARIJKE J VAN BAREN, STEVEN L SALZBERG, BARBARA J WOLD, AND LIOR PACHTER SUPPLEMENTARY METHODS FOR THE PAPER TRANSCRIPT ASSEMBLY AND QUANTIFICATION BY RNA-SEQ REVEALS UNANNOTATED TRANSCRIPTS AND ISOFORM SWITCHING DURING CELL DIFFERENTIATION COLE TRAPNELL, BRIAN A WILLIAMS,

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector.

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. Omar S. Akbari*, Igor Antoshechkin*, Henry Amrhein, Brian Williams, Race Diloreto, Jeremy

More information

An Information-Theoretic Approach to Methylation Data Analysis

An Information-Theoretic Approach to Methylation Data Analysis An Information-Theoretic Approach to Methylation Data Analysis John Goutsias Whitaker Biomedical Engineering Institute The Johns Hopkins University Baltimore, MD 21218 Objective Present an information-theoretic

More information

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Title Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Author list Yu Han 1, Huihua Wan 1, Tangren Cheng 1, Jia Wang 1, Weiru Yang 1, Huitang Pan 1* & Qixiang

More information

RNA- seq read mapping

RNA- seq read mapping RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference

More information

TRANSCRIPTION VS TRANSLATION FILE

TRANSCRIPTION VS TRANSLATION FILE 23 April, 2018 TRANSCRIPTION VS TRANSLATION FILE Document Filetype: PDF 352.85 KB 0 TRANSCRIPTION VS TRANSLATION FILE Get an answer for 'Compare and contrast transcription and translation in Prokaryotes

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

TRANSCRIPTOMICS. (or the analysis of the transcriptome) Mario Cáceres. Main objectives of genomics. Determine the entire DNA sequence of an organism

TRANSCRIPTOMICS. (or the analysis of the transcriptome) Mario Cáceres. Main objectives of genomics. Determine the entire DNA sequence of an organism TRANSCRIPTOMICS (or the analysis of the transcriptome) Mario Cáceres Main objectives of genomics Determine the entire DNA sequence of an organism Identify and annotate the complete set of genes encoded

More information

Gene Control Mechanisms at Transcription and Translation Levels

Gene Control Mechanisms at Transcription and Translation Levels Gene Control Mechanisms at Transcription and Translation Levels Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 9

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering Proteomics 2 nd semester, 2013 1 Text book Principles of Proteomics by R. M. Twyman, BIOS Scientific Publications Other Reference books 1) Proteomics by C. David O Connor and B. David Hames, Scion Publishing

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Stochastic processes and

Stochastic processes and Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University

More information

Genomic expression catalogue of a global collection of BCG vaccine strains. show evidence for highly diverged metabolic and cell-wall adaptations.

Genomic expression catalogue of a global collection of BCG vaccine strains. show evidence for highly diverged metabolic and cell-wall adaptations. Genomic expression catalogue of a global collection of BCG vaccine strains show evidence for highly diverged metabolic and cell-wall adaptations. Abdallah M. Abdallah 1 *, Grant A. Hill-Cawthorne 1,2,

More information

Supplementary Information. Characteristics of Long Non-coding RNAs in the Brown Norway Rat and. Alterations in the Dahl Salt-Sensitive Rat

Supplementary Information. Characteristics of Long Non-coding RNAs in the Brown Norway Rat and. Alterations in the Dahl Salt-Sensitive Rat Supplementary Information Characteristics of Long Non-coding RNAs in the Brown Norway Rat and Alterations in the Dahl Salt-Sensitive Rat Feng Wang 1,2,3,*, Liping Li 5,*, Haiming Xu 5, Yong Liu 2,3, Chun

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1 SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and

More information

ACTA PHYSICA DEBRECINA XLVI, 47 (2012) MODELLING GENE REGULATION WITH BOOLEAN NETWORKS. Abstract

ACTA PHYSICA DEBRECINA XLVI, 47 (2012) MODELLING GENE REGULATION WITH BOOLEAN NETWORKS. Abstract ACTA PHYSICA DEBRECINA XLVI, 47 (2012) MODELLING GENE REGULATION WITH BOOLEAN NETWORKS E. Fenyvesi 1, G. Palla 2 1 University of Debrecen, Department of Experimental Physics, 4032 Debrecen, Egyetem 1,

More information

express: Streaming read deconvolution and abundance estimation applied to RNA-Seq

express: Streaming read deconvolution and abundance estimation applied to RNA-Seq express: Streaming read deconvolution and abundance estimation applied to RNA-Seq Adam Roberts 1 and Lior Pachter 1,2 1 Department of Computer Science, 2 Departments of Mathematics and Molecular & Cell

More information

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology

Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology Purdue University Purdue e-pubs Open Access Dissertations Theses and Dissertations Fall 2013 Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology Han Wu

More information

ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION

ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION Journal Club, April 15th 2014 Karl Frontzek, Institute of Neuropathology POLY(A)-TAIL

More information

CSE182-L8. Mass Spectrometry

CSE182-L8. Mass Spectrometry CSE182-L8 Mass Spectrometry Project Notes Implement a few tools for proteomics C1:11/2/04 Answer MS questions to get started, select project partner, select a project. C2:11/15/04 (All but web-team) Plan

More information

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu. Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Optimal State Estimation for Boolean Dynamical Systems using a Boolean Kalman Smoother

Optimal State Estimation for Boolean Dynamical Systems using a Boolean Kalman Smoother Optimal State Estimation for Boolean Dynamical Systems using a Boolean Kalman Smoother Mahdi Imani and Ulisses Braga-Neto Department of Electrical and Computer Engineering Texas A&M University College

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION reverse 3175 3175 F L C 318 318 3185 3185 319 319 3195 3195 315 8 1 315 3155 315 317 Supplementary Figure 3. Stability of expression of the GFP sensor constructs return to warm conditions. Semi-quantitative

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles created by CRISPR-Cas9 Shigeru Makino, Ryutaro Fukumura, Yoichi Gondo* Mutagenesis and Genomics Team, RIKEN

More information

S A T T A I T ST S I T CA C L A L DAT A A T

S A T T A I T ST S I T CA C L A L DAT A A T Microarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 5 Linear Regression dr. Petr Nazarov 31-10-2011 petr.nazarov@crp-sante.lu Statistical data analysis in Excel. 5. Linear regression OUTLINE Lecture

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin

More information

Differential expression analysis for sequencing count data. Simon Anders

Differential expression analysis for sequencing count data. Simon Anders Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19

More information

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Supplementary Figure 1 Analysis of beige fat and cells and characteristics of exosome release, related to Figure 1

Supplementary Figure 1 Analysis of beige fat and cells and characteristics of exosome release, related to Figure 1 Supplementary Figure 1 Analysis of beige fat and cells and characteristics of exosome release, related to Figure 1 (a) Fold-change in UCP-1 mrna abundance in white adipocytes upon β-adrenergic stimulation

More information

Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons

Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons Gao and Li BMC Genomics (2017) 18:234 DOI 10.1186/s12864-017-3600-2 RESEARCH ARTICLE Open Access Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

A Robust Method for Transcript Quantification with RNA-seq Data

A Robust Method for Transcript Quantification with RNA-seq Data A Robust Method for Transcript Quantification with RNA-seq Data Yan Huang 1, Yin Hu 1, Corbin D. Jones 2, James N. MacLeod 3, Derek Y. Chiang 4, Yufeng Liu 5, Jan F. Prins 6, and Jinze Liu 1 1 Department

More information

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster. NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a

More information

Supplementary Information. Drought response transcriptomics are altered in poplar with reduced tonoplast sucrose transporter expression

Supplementary Information. Drought response transcriptomics are altered in poplar with reduced tonoplast sucrose transporter expression Supplementary Information Drought response transcriptomics are altered in poplar with reduced tonoplast sucrose transporter expression Liang Jiao Xue, Christopher J. Frost, Chung Jui Tsai, Scott A. Harding

More information

Comparative analysis of RNA- Seq data with DESeq2

Comparative analysis of RNA- Seq data with DESeq2 Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Generation of paraxial mesoderm from the H7 hesc line.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Generation of paraxial mesoderm from the H7 hesc line. Supplementary Figure 1 Generation of paraxial mesoderm from the H7 hesc line. H7 hescs were differentiated as shown in Figure 1a. (a) Flow cytometric analyses of the proportion of CD56+, PDGFRα+, and KDR+

More information

Honors Biology Reading Guide Chapter 11

Honors Biology Reading Guide Chapter 11 Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins

More information

Section 7. Junaid Malek, M.D.

Section 7. Junaid Malek, M.D. Section 7 Junaid Malek, M.D. RNA Processing and Nomenclature For the purposes of this class, please do not refer to anything as mrna that has not been completely processed (spliced, capped, tailed) RNAs

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Lecture 15: Programming Example: TASEP

Lecture 15: Programming Example: TASEP Carl Kingsford, 0-0, Fall 0 Lecture : Programming Example: TASEP The goal for this lecture is to implement a reasonably large program from scratch. The task we will program is to simulate ribosomes moving

More information

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Promoters and Enhancers Systematic discovery of transcriptional regulatory motifs

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Genome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv: v2 [q-bio.

Genome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv: v2 [q-bio. Genome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv:153.181v2 [q-bio.gn] 16 Jul 215 Antti Honkela 1, Jaakko Peltonen 2,3, Hande Topa 2, Iryna Charapitsa 4, Filomena

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Boolean models of gene regulatory networks Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016 Gene expression Gene expression is a process that takes gene info and creates

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps

More information

7.32/7.81J/8.591J: Systems Biology. Fall Exam #1

7.32/7.81J/8.591J: Systems Biology. Fall Exam #1 7.32/7.81J/8.591J: Systems Biology Fall 2013 Exam #1 Instructions 1) Please do not open exam until instructed to do so. 2) This exam is closed- book and closed- notes. 3) Please do all problems. 4) Use

More information

How much non-coding DNA do eukaryotes require?

How much non-coding DNA do eukaryotes require? How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics

More information

Identifying Bio-markers for EcoArray

Identifying Bio-markers for EcoArray Identifying Bio-markers for EcoArray Ashish Bhan, Keck Graduate Institute Mustafa Kesir and Mikhail B. Malioutov, Northeastern University February 18, 2010 1 Introduction This problem was presented by

More information

Proteomics Systems Biology

Proteomics Systems Biology Dr. Sanjeeva Srivastava IIT Bombay Proteomics Systems Biology IIT Bombay 2 1 DNA Genomics RNA Transcriptomics Global Cellular Protein Proteomics Global Cellular Metabolite Metabolomics Global Cellular

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

Supplementary Figure S1

Supplementary Figure S1 Supplementary Figure S1 KRO STO R-p R-p3 RUD R-nt L-n L-h L-l ph 7 WEON [mg/kg dw soil] 3 WEOC [mg/kg dw soil] 3 Soil type sandy sandy loam silty clay GSF B77V B77T E16 Figure S1:

More information

Energy and Cellular Metabolism

Energy and Cellular Metabolism 1 Chapter 4 About This Chapter Energy and Cellular Metabolism 2 Energy in biological systems Chemical reactions Enzymes Metabolism Figure 4.1 Energy transfer in the environment Table 4.1 Properties of

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq

ChIP seq peak calling. Statistical integration between ChIP seq and RNA seq Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription

More information

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation Authors: Fan Zhang, Runsheng Liu and Jie Zheng Presented by: Fan Wu School of Computer Science and

More information

A Simple Protein Synthesis Model

A Simple Protein Synthesis Model A Simple Protein Synthesis Model James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University September 3, 213 Outline A Simple Protein Synthesis Model

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

Single Cell Sequencing

Single Cell Sequencing Single Cell Sequencing Fundamental unit of life Autonomous and unique Interactive Dynamic - change over time Evolution occurs on the cellular level Robert Hooke s drawing of cork cells, 1665 Type Prokaryotes

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scrna-seq de October 2017 1 / 34 Outline Introduction: what

More information

Causal Discovery by Computer

Causal Discovery by Computer Causal Discovery by Computer Clark Glymour Carnegie Mellon University 1 Outline 1. A century of mistakes about causation and discovery: 1. Fisher 2. Yule 3. Spearman/Thurstone 2. Search for causes is statistical

More information

The Blessing and the Curse

The Blessing and the Curse The Blessing and the Curse of the Multiplicative Updates Manfred K. Warmuth University of California, Santa Cruz CMPS 272, Feb 31, 2012 Thanks to David Ilstrup and Anindya Sen for helping with the slides

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information