An Information-Theoretic Approach to Methylation Data Analysis

Size: px
Start display at page:

Download "An Information-Theoretic Approach to Methylation Data Analysis"

Transcription

1 An Information-Theoretic Approach to Methylation Data Analysis John Goutsias Whitaker Biomedical Engineering Institute The Johns Hopkins University Baltimore, MD 21218

2 Objective Present an information-theoretic approach to methylation data analysis. This leads to: Employing the Shannon entropy to quantify epigenetic stochasticity. Predicting large-scale chromatin organization (compartments A/B). Predicting smaller-scale chromatin organization (TADs). Using an information-theoretic distance to quantify epigenetic discordance. Developing a sensitivity analysis approach to quantify environmental influences on epigenetic stochasticity.

3 µ : demethylationprobability ν : methylation probability Methylation Maintenance DNA methylation 1 µ ν ν µ 0 0 methylation channel DNA methylation Po ut ( 0) = (1 ν ) Pin (0) +µ Pin (1) P (1) =ν P (0) + (1 µ ) P (1) out in in

4 Methylation Maintenance DNA methylation µ : demethylationprobability ν : methylation probability 1 µ ν ν µ 0 0 methylation channel Po ut ( 0) = (1 ν ) Pin (0) +µ Pin (1) P (1) =ν P (0) + (1 µ ) P (1) out in in DNA methylation Use methylation channels at each CpG site (noisy binary communication channels). Must estimate probabilities µ and ν from data Not possible! At steady-state, P out = P in and ν Pin(1) λ= = µ 1 P (1) in

5 Information Capacity of a Methylation Channel Maximum average information that can be conveyed during a maintenance step: { I X X } C = ma x, ) P(1) ( out in mutual information Mutual information: a measure of mutual dependence between input and output methylation. Available formula for capacity, but depends on probabilities µ and ν. Approximate formula can be derived that depends only on λ=ν / µ.

6 Probability of Transmission Error π = Pr[output methylation input methylation] Quantifies fidelity of methylation transmission through a methylation channel. Depends on probabilities µ and ν. An approximate formula can be derived that depend only on λ=ν / µ.

7 Energy Dissipation Correct transmission of the methylation state requires work. This consumes free energy that is dissipated to the surroundings in the form of heat. The minimum dissipated energy is related (for some systems) to the probability of error Relative dissipated energy: E~ kt B ln π E ε = log 2 π E max

8 Methylation Stochasticity We can characterize methylation stochasticity at a CpG site using the Shannon entropy: S = P(0) log P(0) P(1) log P(1) 2 2

9 Capacity, Relative Dissipated Energy, Entropy 1 ENTR RDE CAP log λ 2 Reliable transmission of methylation information within critical regions of the genome is facilitated by high capacity methylation channels that result in low methylation stochasticity at the cost of high energy consumption. Unreliable methylation transmission within other regions of the genome due to low capacity methylation channels that consume less energy but produce higher levels of methylation stochasticity.

10 Capacity, Relative Dissipated Energy, Entropy Scale chr1: 2 Mb hg19 34,000,000 34,500,000 35,000,000 35,500,000 36,000,000 36,500,000 37,000,000 37,500,000 38,000,000 38,500,000 39,000,000 1 _ CAP-livernormal 0 _ 10 _ RDE-livernormal 0 _ 1 _ ENTR-livernormal 0 _

11 Capacity, Relative Dissipated Energy, Entropy Scale chr1: 1 _ 1 kb hg19 150,601, ,602, ,602, ,603, ,603,500 CAP-livernormal 0 _ 15 _ RDE-livernormal 0 _ 1 _ ENTR-livernormal 0 _ CGI UCSC Genes (RefSeq, GenBank, CCDS, Rfam, trnas & Comparative Genomics) ENSA polycomb target gene proposed as candidate for type 2 diabetes

12 Capacity, Relative Dissipated Energy, Entropy NORMAL/CANCER YOUNG/OLD density lungnor mal-3 lungcancer-3 density CD4-Y3 CD4-O CAP CAP density density density RDE density RDE ENTR ENTR

13 Capacity, Relative Dissipated Energy, Entropy Scale chr1: 0.2 _ 2 Mb hg19 34,000,000 34,500,000 35,000,000 35,500,000 36,000,000 36,500,000 37,000,000 37,500,000 38,000,000 38,500,000 39,000,000 dcap-livercancer-vs-livernormal _ 8 _ drde-livercancer-vs-livernormal _ 1 _ dentr-livercancer-vs-livernormal _

14 Capacity, Relative Dissipated Energy, Entropy Scale chr1: 1 kb hg19 150,601, ,602, ,602, ,603, ,603, _ dcap-livercancer-vs-livernormal _ 8 _ drde-livercancer-vs-livernormal _ 0.6 _ dentr-livercancer-vs-livernormal _ CGI UCSC Genes (RefSeq, GenBank, CCDS, Rfam, trnas & Comparative Genomics) ENSA polycomb target gene proposed as candidate for type 2 diabetes

15 Information-Theoretic Prediction of Chromatin Organization Chromatin is organized in cell-type specific compartments A and B. Scale 20Mb EBV hg19 chr1 190,000, ,000,000 Hi-C A B Genes 2015, 6(3), Compartment A: associated with gene-rich transcriptionally active open chromatin. Compartment B: associated with gene-poor transcriptionally inactive chromatin.

16 Information-Theoretic Prediction of Chromatin Organization Scale 20Mb EBV hg EBV 12 chr1 190,000, ,000,000 Hi-C A B information capacity relative dissipated energy methylation entropy ENTR CAP RDE A B A B 0.00 Enrichment of low information capacity, low relative dissipated energy, and high entropy within compartment B! Opposite true for compartment A!

17 Information-Theoretic Prediction of Chromatin Organization Random forest regression, using information-theoretic quantities as features, learns the informational structure of compartments A/B from available groundtruth data. Achieves reliable prediction: 0.82 cross-validated average correlation and 91% average agreement between predicted and true A/B signals. A small number of local information-theoretic properties of methylation maintenance can be highly predictive of large-scale chromatin organization!

18 Information-Theoretic Prediction of Chromatin Organization 0.5 stem Clustered 34 samples using the net percentage of A/B switching as a dissimilarity measure. 31/34 samples grouped in a biologically meaningful manner! height fibro-p4 fibro-p7 fibro-p10 fibro-p31 fibro-p33 livercancer-1 livercancer-2 livercancer-3 livernormal-1 livernormal-2 livernormal-3 livernormal-5 livernormal-4 lungnormal-1 lungnormal-2 lungnormal-3 lungcancer-3 lungcancer-1 colonnormal lungcancer-2 * * CD4-Y1 coloncancer CD4-O1 CD4-O3 CD4-O2 CD4-Y3 CD4-Y2 brain-1 brain-2 * ker-y2 ker-y1 ker-o1 ker-o2

19 Quantifying Epigenetic Stochasticity Source of epigenetic stochasticity: Transmission of epigenetic information during maintenance through low capacity methylation channels that consume low levels of free energy. Quantify epigenetic stochasticity using the concept of Shannon entropy. To account for methylation dependence within a genomic region, must consider the joint Shannon entropy Note that H ( X, X,..., X ) = px (, x,..., x )log p( x, x,..., x ) 1 2 N 1 2 N N x, x,..., x N 1 2 N 1, n-th CpG site is methylated = 0, n-th CpG site is unmethylated H( X, X,..., X ) H( X ) + H( X ) + + H( X ) X n N

20 Methylation Level Partition the genome into non-overlapping genomic units of 150 bp each (determines resolution of analysis). Instead of focusing on methylation patterns within a genomic unit, we quantify methylation using the methylation level L 1 N Xn N n = 1 = X n 1, n-th CpG site is methylated = 0, n-th CpG site is unmethylated Compute the probability distribution Ising model. Pl () of the methylation level from the

21 Mean Methylation Level The mean methylation level (MML) is the average of the methylation means at the CpG sites within a genomic unit N 1 E[ L] = E[ Xn] N n 1

22 Normalized Methylation Entropy Quantify epigenetic stochasticity within each genomic unit using the normalized methylation entropy (NME) Ranges between 0 and 1 0: fully ordered state 1: fully disordered state h 1 = ()log 2 () log ( N + 1) Pl Pl l 2

23 entropy = mean log 2 (mean) (1 mean)log (1 mean) 2 Normalized Methylation Entropy MML not predictive of NME!

24 Genome-wide MML and NME Distributions stem colonnormal coloncancer livernormal-1 livercancer-1 livernormal-2 livercancer-2 livernormal-3 livercancer-3 lungnormal-1 lungcancer-1 lungnormal-2 lungcancer-2 lungnormal-3 lungcancer-3 brain-1 brain-2 fibro-p4 fibro-p7 fibro-p10 fibro-p31 fibro-p33 CD4 Y1 CD4 Y2 CD4 Y3 CD4 O1 CD4 O2 CD4 O3 ker-y1 ker-y2 ker-o1 ker-o2 livernormal-4 livernormal MML NME

25 Topologically Associating Domains Genes 2015, 6(3), Topologically associating domains (TADs): highly conserved structural features of the genome across tissue types and species. Loci within TADs tend to interact frequently with each other. Less frequent interactions take place between loci within adjacent TADs.

26 Entropy Blocks Large genomic regions of consistently low or high NME values. ordered entropy block disordered entropy block H1-TAD boundaries Scale chr1: 5 Mb hg19 30,000,000 35,000,000 40,000,000 colonnormal coloncancer livernormal-1 livercancer-1 livernormal-2 livercancer-2 livernormal-3 livercancer-3 lungnormal-1 lungcancer-1 lungnormal-2 lungcancer-2 lungnormal-3 lungcancer-3

27 Entropy Blocks Large genomic regions of consistently low or high NME values. ordered entropy block disordered entropy block H1-TAD boundaries Scale chr1: colonnormal coloncancer livernormal-1 livercancer-1 livernormal-2 livercancer-2 livernormal-3 livercancer-3 lungnormal-1 lungcancer-1 lungnormal-2 lungcancer-2 lungnormal-3 lungcancer-3 5 Mb hg19 30,000,000 35,000,000 40,000,000 Known TAD boundary annotations are visually proximal to boundaries of entropy blocks!

28 Identification of TAD boundaries phenotype 1 TAD TAD TAD TAD TAD TAD TAD TAD TAD disordered disordered ordered ordered TADs are associated with consistently low or high NME values, and can be accurately identified from entropy blocks. phenotype 3 phenotype 2 TAD TAD TAD TAD TAD TAD TAD TAD TAD disordered disordered disordered ordered ordered TAD TAD TAD TAD TAD TAD TAD TAD TAD disordered disordered ordered ordered

29 Quantifying Epigenetic Discordance Evaluate differences in probability distributions of methylation levels within genomic units using a distance metric. GENOMIC UNIT NORMAL CANCER probability probability P () c l 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level P () n l 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level DISTANCE?

30 Jensen-Shannon Distance GENOMIC UNIT NORMAL CANCER probability probability P () c l 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level P () n l 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level D 1 Pn() l + Pc() l Pn() l + Pc() l = D ( P ( l), ) DKL( P( l), ) JS KL n c D ( PQ, ) Pl ( )log KL 0 D 1 JS Pl () = 2 l Ql ()

31 Jensen-Shannon Distance 1 CANCER probability /7 2/7 3/7 4/7 5/7 6/7 1 methylation level 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level 1 NORMAL probability /7 2/7 3/7 4/7 5/7 6/7 1 methylation level 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level 0 1/7 2/7 3/7 4/7 5/7 6/7 1 methylation level JSD = 1.0 dmml = 0.7 dnme = 0.0 JSD mainly driven by difference in MML JSD = 0.9 dmml = 0.0 dnme = 0.6 JSD mainly driven by difference in NME JSD = 1.0 dmml = 0.0 dnme = 0.0 JSD driven by factors other than MML or NME

32 Jensen-Shannon Distance 1 _ JSD-lungcancer-VS-lungnormal SALL3 0 _ 1 _ dmml-lungcancer-vs-lungnormal _ 1 _ dnme-lungcancer-vs-lungnormal _

33 Jensen-Shannon Distance 1 _ 0 _ 1 _ JSD-lungcancer-VS-lungnormal dmml-lungcancer-vs-lungnormal SALL3 SAL3 has been implicated in lung cancer _ 1 _ dnme-lungcancer-vs-lungnormal _ 1 _ JSD-lungcancer-VS-lungnormal 0 _ CGI 1 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, trnas & Comparative Genomics) SALL3 dmml-lungcancer-vs-lungnormal _ 1 _ dnme-lungcancer-vs-lungnormal _

34 Jensen-Shannon Distance colonnormal-vs-stem lungcancer-1-vs-lungnormal Jensen-Shannon distance Jensen-Shannon distance No consistent containment of epigenetic dissimilarity within a particular genomic feature genome-wide promoters exons introns 5UTRs 3UTRs intergenic CGIs shores shelves open sea uc-enhancers uo-enhancers p-enhancers r-enhancers genome-wide promoters exons introns 5UTRs 3UTRs intergenic CGIs shores shelves open sea uc-enhancers uo-enhancers p-enhancers r-enhancers CD4-Y2-VS-stem 0.8 CD4-Y2-1-VS-lungnormal-3 Jensen-Shannon distance Jensen-Shannon distance brain-1-vs-stem brain-1-vs-lungnormal Jensen-Shannon distance Jensen-Shannon distance

35 1.00 Jensen-Shannon Distance Jensen-Shannon distance A B 0.00 coloncancer vs colonnormal livercancer1 vs livernormal-1 livercancer-2 vs livernormal-2 livercancer-3 vs livernormal-3 lungcancer-1 vs lungnormal-1 lungcancer-2 vs lungnormal-2 lungcancer-3 vs lungnormal-3 Larger epigenetic discordance is observed in compartment B than in A.

36 Quantified epigenetic discordance between two samples by calculating the average of all JSD values computed genome-wide. Computed this measure of epigenetic discordance between 17 representative samples. Formed the corresponding dissimilarity matrix. Used multidimensional scaling to find a 2D configuration of points whose interpoint distances approximately correspond to the epigenetic dissimilarities among the samples. Jensen-Shannon Distance

37 Jensen-Shannon Distance Quantified epigenetic discordance between two samples by calculating the average of all JSD values computed genome-wide. Computed this measure of epigenetic discordance between 17 representative samples. Formed the corresponding dissimilarity matrix. Used multidimensional scaling to find a 2D configuration of points whose interpoint distances approximately correspond to the epigenetic dissimilarities among the samples. MESODERM ECTODERM differentiation ENDODERM lungcancer-3 lungnor mal-3 lungcancer-1 lungnor mal-1 CD4-O2CD4-O1 CD4-Y3 liver normal-1 CD4-Y2 colonnor mal coloncancer brain-1 stem liver nor mal-2 brain-2 carcinogenesis // grouping of samples into categories based on lineage // livercancer-2 livercancer-1

38 Sensitivity Analysis Epigenetic changes integrate environmental signals with genetic variation to modulate phenotype. Quantify influence of environmental exposure on methylation stochasticity. View environmental variability as a process that directly influences the parameters of the methylation potential energy landscape (Ising parameters).

39 Entropic Sensitivity Index Consider the case for which the Ising parameters θ fluctuate around their true values by a random amount G θ, where G is a zero-mean normal distribution with standard deviation σ. Can show that η σ σh variation in entropy (NME) hg ( ) η= g g = 0 environmental variation Entropic sensitivity index (ESI)

40 Sensitivity Analysis entropic sensitivity index stem colonnormal coloncancer livernormal-1 livercancer-1 livernormal-2 livercancer-2 livernormal-3 livercancer-3 livernormal-4 livernormal-5 lungnormal-1 lungcancer-1 lungnormal-2 lungcancer-2 lungnormal-3 lungcancer-3 brain-1 brain-2 fibro-p4 fibro-p7 fibro-p10 fibro-p31 fibro-p33 CD4 Y1 CD4 Y2 CD4 Y3 CD4 O1 CD4 O2 CD4 O3 ker-y1 ker-y2 ker-o1 ker-o2 Globally observed differences in entropic sensitivity among tissues.

41 Sensitivity Analysis entropic sensitivity index compartment A compartment B stem colonnormal coloncancer livernormal-1 livercancer-1 livernormal-2 livercancer- 2 livernormal-3 livercancer-3 livernormal-4 livernormal-5 lungnormal-1 lungcancer-1 lungnormal-2 lungcancer2 lungnormal-3 lungcancer-3 brain-1 brain-2 fibro-p4 fibro- P7 fibro-p10 fibro-p31 fibro-p33 CD4 Y1 CD4 Y2 CD4 Y3 CD4 O1 CD4 O2 CD4 O3 ker-y1 ker-y2 ker-o1 ker-o2 Entropic sensitivity within compartment A is higher than in compartment B.

42 Sensitivity Analysis entropic sensitivity index compartment A compartment B colonnormal coloncancer livernormal-1 livercancer-1 livernormal-2 livercancer-2 livernormal-3 Observed differences between normal and cancer are largely confined to compartment B. livercancer-3 livernormal-4 livernormal-5 lungnormal-1 lungcancer-1 lungnormal-2 lungcancer2 lungnormal-3 lungcancer-3

43 Sensitivity Analysis (Normal-Cancer) Scale 20 Mb hg19 chr1: 25,000,000 30,000,000 35,000,000 40,000,000 45,000,000 50,000,000 55,000,000 60,000,000 65,000, _ ESI-colonnormal 0 _ 1.5 _ ESI-coloncancer 0 _ 0.5 _ desi-coloncancer-vs-colonnormal _ 1.5 _ ESI-lungnormal-1 0 _ 1.5 _ ESI-lungcancer-1 0 _ 0.5 _ desi-lungcancer-1-vs-lungnormal _

44 Sensitivity Analysis (Normal-Cancer) Scale 5 kb hg19 chr1: 150,595, ,597, ,599, ,601, ,603, ,605,000 1 _ NME-livernormal-1 0 _ 1 _ NME-livercancer-1 0 _ 3 _ ESI-livernormal-1 0 _ 3 _ ESI-livercancer-1 0 _ 3 _ desi-livercancer-1-vs-livernormal _ CGI ENSA ENSA ENSA ENSA ENSA ENSA ENSA ENSA UCSC Genes (RefSeq, GenBank, CCDS, Rfam, trnas & Comparative Genomics) ENSA ENSA polycomb target gene proposed as a candidate gene for type 2 diabetes

45 Sensitivity Analysis (Young-Old) Scale 100 Mb hg19 chr1: 50,000, ,000, ,000, ,000, _ desi-cd4-y2-vs-cd4-y _ 0.5 _ desi-cd4-y3-vs-cd4-y _ 0.5 _ desi-cd4-y3-vs-cd4-y _ 0.5 _ desi-cd4-o3-vs-cd4-y _ 0.5 _ desi-cd4-o3-vs-y _ 0.5 _ desi-cd4-o3-vs-cd4-y _

46 Sensitivity Analysis (Young-Old) Scale 1 kb hg19 chr21: 27,541,000 27,541,500 27,542,000 27,542,500 27,543,000 27,543,500 27,544,000 27,544,500 2 _ ESI-CD4-Y1 0 _ 2 _ ESI-CD4-Y2 0 _ 2 _ ESI-CD4-Y3 0 _ CGI 2 _ UCSC Genes (RefSeq, GenBank, CCDS, Rfam, trnas & Comparative Genomics) desi-cd4-o3-vs-cd4-y1 APP amyloid precursor protein gene implicated in Alzheimer s _ 2 _ desi-cd4-o3-vs-cd4-y _ 2 _ desi-cd4-o3-vs-y _

47 Conclusion An information-theoretic approach to epigenomics based on the Ising model provides a formal foundation to methylation analysis. Yields fundamental insights into epigenetic behavior. Demonstrates a relationship between chromatin structure methylation channels entropic sensitivity This may maximize an organism s efficiency in storing epigenetic information and help explain developmental plasticity.

48 Conclusion Pluripotent stem cells require relatively high energy to maintain high capacity methylation channels within a portion of the genome, achieving reduced methylation stochasticity. Regions characterized by increased entropic sensitivity are associated with highly deformable potential energy landscapes, which may correspond to differentiation branch points. After differentiation, some large genomic domains, such as regions associated with pluripotency, need not maintain high channel capacities and energy consumption their sequestration providing increased energy efficiency with the cost of high epigenetic stochasticity and reduced responsiveness.

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

TRANSCRIPTOMICS. (or the analysis of the transcriptome) Mario Cáceres. Main objectives of genomics. Determine the entire DNA sequence of an organism

TRANSCRIPTOMICS. (or the analysis of the transcriptome) Mario Cáceres. Main objectives of genomics. Determine the entire DNA sequence of an organism TRANSCRIPTOMICS (or the analysis of the transcriptome) Mario Cáceres Main objectives of genomics Determine the entire DNA sequence of an organism Identify and annotate the complete set of genes encoded

More information

How much non-coding DNA do eukaryotes require?

How much non-coding DNA do eukaryotes require? How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics

More information

INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT

INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT INTEGRATING EPIGENETIC PRIORS FOR IMPROVING COMPUTATIONAL IDENTIFICATION OF TRANSCRIPTION FACTOR BINDING SITES AFFAN SHOUKAT A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Measures of hydroxymethylation

Measures of hydroxymethylation Measures of hydroxymethylation Alla Slynko Axel Benner July 22, 2018 arxiv:1708.04819v2 [q-bio.qm] 17 Aug 2017 Abstract Hydroxymethylcytosine (5hmC) methylation is well-known epigenetic mark impacting

More information

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science 1 Analysis and visualization of protein-protein interactions Olga Vitek Assistant Professor Statistics and Computer Science 2 Outline 1. Protein-protein interactions 2. Using graph structures to study

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Eukaryotic Gene Expression

Eukaryotic Gene Expression Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

RNA Processing: Eukaryotic mrnas

RNA Processing: Eukaryotic mrnas RNA Processing: Eukaryotic mrnas Eukaryotic mrnas have three main parts (Figure 13.8): 5! untranslated region (5! UTR), varies in length. The coding sequence specifies the amino acid sequence of the protein

More information

The Physical Language of Molecules

The Physical Language of Molecules The Physical Language of Molecules How do molecular codes emerge and evolve? International Workshop on Bio Soft Matter Tokyo, 2008 Biological information is carried by molecules Self replicating information

More information

Regulation of gene Expression in Prokaryotes & Eukaryotes

Regulation of gene Expression in Prokaryotes & Eukaryotes Regulation of gene Expression in Prokaryotes & Eukaryotes 1 The trp Operon Contains 5 genes coding for proteins (enzymes) required for the synthesis of the amino acid tryptophan. Also contains a promoter

More information

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur Module ntroduction to Digital Communications and nformation Theory Lesson 3 nformation Theoretic Approach to Digital Communications After reading this lesson, you will learn about Scope of nformation Theory

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island

More information

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models (HMMs) November 14, 2017 Hidden Markov Models (HMMs) November 14, 2017 inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes

More information

SBI lab Jou-hyun Jeon Jae-seong Yang Solip Park Yonghwan Choi Yoonsup Choi Jinho Kim HyunJun Nam JiHye Hwang Inhae Kim Youngeun Shin Sung gyu Han

SBI lab Jou-hyun Jeon Jae-seong Yang Solip Park Yonghwan Choi Yoonsup Choi Jinho Kim HyunJun Nam JiHye Hwang Inhae Kim Youngeun Shin Sung gyu Han POSTECH 생명과학과김상욱 Structural Bioinformatics Laboratory Pohang University of Science and Technology Acknowledgement SBI lab Jou-hyun Jeon Jae-seong Yang Solip Park Yonghwan Choi Yoonsup Choi Jinho Kim HyunJun

More information

Quantum rate distortion, reverse Shannon theorems, and source-channel separation

Quantum rate distortion, reverse Shannon theorems, and source-channel separation Quantum rate distortion, reverse Shannon theorems, and source-channel separation ilanjana Datta, Min-Hsiu Hsieh, Mark Wilde (1) University of Cambridge,U.K. (2) McGill University, Montreal, Canada Classical

More information

QB LECTURE #4: Motif Finding

QB LECTURE #4: Motif Finding QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Casting Polymer Nets To Optimize Molecular Codes

Casting Polymer Nets To Optimize Molecular Codes Casting Polymer Nets To Optimize Molecular Codes The physical language of molecules Mathematical Biology Forum (PRL 2007, PNAS 2008, Phys Bio 2008, J Theo Bio 2007, E J Lin Alg 2007) Biological information

More information

Peter Pristas. Gene regulation in eukaryotes

Peter Pristas. Gene regulation in eukaryotes Peter Pristas BNK1 Gene regulation in eukaryotes Gene Expression in Eukaryotes Only about 3-5% of all the genes in a human cell are expressed at any given time. The genes expressed can be specific for

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

Lectures on Medical Biophysics Department of Biophysics, Medical Faculty, Masaryk University in Brno. Biocybernetics

Lectures on Medical Biophysics Department of Biophysics, Medical Faculty, Masaryk University in Brno. Biocybernetics Lectures on Medical Biophysics Department of Biophysics, Medical Faculty, Masaryk University in Brno Norbert Wiener 26.11.1894-18.03.1964 Biocybernetics Lecture outline Cybernetics Cybernetic systems Feedback

More information

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical

More information

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

GENE REGULATION AND PROBLEMS OF DEVELOPMENT GENE REGULATION AND PROBLEMS OF DEVELOPMENT By Surinder Kaur DIET Ropar Surinder_1998@ yahoo.in Mob No 9988530775 GENE REGULATION Gene is a segment of DNA that codes for a unit of function (polypeptide,

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

Fitness landscapes and seascapes

Fitness landscapes and seascapes Fitness landscapes and seascapes Michael Lässig Institute for Theoretical Physics University of Cologne Thanks Ville Mustonen: Cross-species analysis of bacterial promoters, Nonequilibrium evolution of

More information

Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

More information

Informa(on theory in ML

Informa(on theory in ML Informa(on theory in ML Feature Selec+on For efficiency of the classifier and to suppress noise choose subset of all possible features. Selected features should be frequent to avoid overfitting the classifier

More information

System Biology - Deterministic & Stochastic Dynamical Systems

System Biology - Deterministic & Stochastic Dynamical Systems System Biology - Deterministic & Stochastic Dynamical Systems System Biology - Deterministic & Stochastic Dynamical Systems 1 The Cell System Biology - Deterministic & Stochastic Dynamical Systems 2 The

More information

Stem Cell Reprogramming

Stem Cell Reprogramming Stem Cell Reprogramming Colin McDonnell 1 Introduction Since the demonstration of adult cell reprogramming by Yamanaka in 2006, there has been immense research interest in modelling and understanding the

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Identification of ELM evolution patterns with unsupervised clustering of time-series similarity metrics

Identification of ELM evolution patterns with unsupervised clustering of time-series similarity metrics Identification of ELM evolution patterns with unsupervised clustering of time-series similarity metrics David R. Smith 1, G. McKee 1, R. Fonck 1, A. Diallo 2, S. Kaye 2, B. LeBlanc 2, S. Sabbagh 3, and

More information

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance. 9. Distance measures 9.1 Classical information measures How similar/close are two probability distributions? Trace distance Fidelity Example: Flipping two coins, one fair one biased Head Tail Trace distance

More information

Lineage specific conserved noncoding sequences in plants

Lineage specific conserved noncoding sequences in plants Lineage specific conserved noncoding sequences in plants Nilmini Hettiarachchi Department of Genetics, SOKENDAI National Institute of Genetics, Mishima, Japan 20 th June 2014 Conserved Noncoding Sequences

More information

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16 Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Enduring understanding 3.B: Expression of genetic information involves cellular and molecular

More information

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

Chapter 18: Control of Gene Expression

Chapter 18: Control of Gene Expression Chapter 18: Control of Gene Expression 海洋生物研究所 曾令銘 海事大樓 426 室分機 : 5326 Differential Expression of Genes Prokaryotes and eukaryotes precisely regulate gene expression in response to environmental conditions

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

CSE468 Information Conflict

CSE468 Information Conflict CSE468 Information Conflict Lecturer: Dr Carlo Kopp, MIEEE, MAIAA, PEng Lecture 02 Introduction to Information Theory Concepts Reference Sources and Bibliography There is an abundance of websites and publications

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

More information

FYTN05/TEK267 Chemical Forces and Self Assembly

FYTN05/TEK267 Chemical Forces and Self Assembly FYTN05/TEK267 Chemical Forces and Self Assembly FYTN05/TEK267 Chemical Forces and Self Assembly 1 Michaelis-Menten (10.3,10.4) M-M formalism can be used in many contexts, e.g. gene regulation, protein

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Paper by: James P. Balhoff and Gregory A. Wray Presentation by: Stephanie Lucas Reviewed

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

Package FEM. January 24, 2018

Package FEM. January 24, 2018 Type Package Package FEM January 24, 2018 Title Identification of Functional Epigenetic Modules Version 3.6.0 Date 2016-12-05 Author Andrew E. Teschendorff and Zhen Yang Maintainer Zhen Yang

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Geert Geeven. April 14, 2010

Geert Geeven. April 14, 2010 iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the

More information

Regulation of Transcription in Eukaryotes

Regulation of Transcription in Eukaryotes Regulation of Transcription in Eukaryotes Leucine zipper and helix-loop-helix proteins contain DNA-binding domains formed by dimerization of two polypeptide chains. Different members of each family can

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Chromatin Structure and Transcriptional Activity Shaping Genomic Landscapes in Eukaryotes

Chromatin Structure and Transcriptional Activity Shaping Genomic Landscapes in Eukaryotes Invisible Cities: Segregated domains in eukaryote genomes Chromatin Structure and Transcriptional Activity Shaping Genomic Landscapes in Eukaryotes Christoforos Nikolaou Computational Genomics Group Dept.

More information

Chapter 2 Review of Classical Information Theory

Chapter 2 Review of Classical Information Theory Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of

More information

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models for biological sequence analysis I Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

AP Curriculum Framework with Learning Objectives

AP Curriculum Framework with Learning Objectives Big Ideas Big Idea 1: The process of evolution drives the diversity and unity of life. AP Curriculum Framework with Learning Objectives Understanding 1.A: Change in the genetic makeup of a population over

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

The majority of environmental factors

The majority of environmental factors POINT-OF-VIEW Epigenetics 6:7, 838-842; July 2011; 2011 Landes Bioscience Environmental and somatic epigenetic mitotic stability Michael K. Skinner Center for Reproductive Biology; School of Biological

More information

Asymptotic Capacity Bounds for Magnetic Recording. Raman Venkataramani Seagate Technology (Joint work with Dieter Arnold)

Asymptotic Capacity Bounds for Magnetic Recording. Raman Venkataramani Seagate Technology (Joint work with Dieter Arnold) Asymptotic Capacity Bounds for Magnetic Recording Raman Venkataramani Seagate Technology (Joint work with Dieter Arnold) Outline Problem Statement Signal and Noise Models for Magnetic Recording Capacity

More information

#33 - Genomics 11/09/07

#33 - Genomics 11/09/07 BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Generative MaxEnt Learning for Multiclass Classification

Generative MaxEnt Learning for Multiclass Classification Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Channel capacity Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Exercices Exercise session 11 : Channel capacity 1 1. Source entropy Given X a memoryless

More information

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Promoters and Enhancers Systematic discovery of transcriptional regulatory motifs

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:

More information

Supplemental Information. Inferring Cell-State Transition. Dynamics from Lineage Trees. and Endpoint Single-Cell Measurements

Supplemental Information. Inferring Cell-State Transition. Dynamics from Lineage Trees. and Endpoint Single-Cell Measurements Cell Systems, Volume 3 Supplemental Information Inferring Cell-State Transition Dynamics from Lineage Trees and Endpoint Single-Cell Measurements Sahand Hormoz, Zakary S. Singer, James M. Linton, Yaron

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

Inferring gene regulatory networks

Inferring gene regulatory networks BENG 203: Genomics, Proteomics & Network Biology Inferring gene regulatory networks Trey Ideker Vineet Bafna Reading assignment Gardner, di Bernardo, Lorenz, and Collins. Inferring Genetic Networks and

More information

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions Molecular Interactions F14NMI Lecture 4: worked answers to practice questions http://comp.chem.nottingham.ac.uk/teaching/f14nmi jonathan.hirst@nottingham.ac.uk (1) (a) Describe the Monte Carlo algorithm

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information