Information Transfer in Biological Systems

Size: px
Start display at page:

Download "Information Transfer in Biological Systems"

Transcription

1 Information Transfer in Biological Systems W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN May 19, 2009 AofA and IT logos Cergy Pontoise, 2009 Joint work with M. Aktulga, A. Grama, M. Koyuturk, I. Kontoyiannis, L. Lyznik, L. Szpankowski.

2 Outline 1. What is Information? Information in Biology Beyond Shannon Information 2. Information in Correlation: Finding Dependency Biological Signals Mutual Information Alternative Splicing 3. Information in Structures: Network Motifs Biological Networks Finding Biologically Significant Structures Structural Compression (Bonus)

3 What is Information? C. Shannon: These semantic aspects of communication are irrelevant... C. F. Von Weizsäcker: Information is only that which produces information (relativity). Information is only that which is understood (rationality) Information has no absolute meaning. F. Brooks, jr. (JACM, 2003, Three Great Challenges... ): We have no theory however that gives us a metric for the Information embodied in structure... this is the most fundamental gap in the theoretical underpinning of Information and computer science. P. Nurse, (Nature, 2008, Life, Logic, and Information ):... opines that the successes of system biology must be supplemented by deeper investigations into how living systems gather, process, store and use Information.

4 Information in Biology M. Eigen, Treatise on Matter, Information, Life and Thought : The differentiable characteristic of the living systems is Information. Information assures the controlled reproduction of all constituents, ensuring conservation of viability.... Information theory, pioneered by Claude Shannon, cannot answer this question... in principle, the answer was formulated 130 years ago by Charles Darwin. Some Fundamental Questions: How information is generated and transferred through underlying mechanisms of variation and selection (e.g., Darwin channel?); How information in biomolecules (sequences and structures) relates to the organization of the cell; What structure of a biological network significantly inform us of the presence of a phenotype (disease)? Whether there are error correcting mechanisms (codes) in biomolecules; How organisms survive and thrive in noisy environments. Life is a delicate interplay of energy, entropy, and information; essential functions of living beings correspond to the generation, consumption, processing, preservation, and duplication of information.

5 Beyond Shannon Participants of 2005&2008 Information Beyond Shannon workshop realize: Time: When information is transmitted over networks of gene regulation, protein interactions, immune response, the associated delay is an important factor. Space: In molecular interaction networks, spatially distributed components raise fundamental issues of limitations in information exchange since the available resources must be shared, allocated, and re-used. Information and Control: Information is exchanged in space and time for decision making, thus timeliness of information delivery along with reliability and complexity constitute the basic objective. Semantics: How to represent and measure biological Information that is context-dependent? Learnable Information: How much information can be extracted from a data repository? What information residing in living systems is meaningful? Limited Resources: Ability to communicate information is often limited by available resources (e.g., bandwidth of signaling channels). How much information can be extracted and processed with limited resources? Dynamic information: In a complex dynamic network Information is not only communicated but also processed and generated (human brain).

6 Science of Information

7 Outline Update 1. What is Information? 2. Information in Correlation: Finding Dependency Biological Signals Mutual Information Alternative Splicing 3. Information in Structures: Network Motifs

8 Biological Signals Biological world is highly stochasticand inhomogeneous (S. Salzberg). Physical limits within a cell are quite severe so it likely it has evolved to maximize information capacity (W. Bialek et. al.). Start codon codons Donor site CGCCATGCCCTTCTCCAACAGGTGAGTGAGC Transcription start Exon Promoter 5 UTR CCTCCCAGCCCTGCCCAG Acceptor site Intron Poly-A site Stop codon GATCCCCATGCCTGAGGGCCCCTC GGCAGAAACAATAAAACCAC 3 UTR

9 How to Measure Dependency: Theoretical Background Question: How to measure dependency between two parts of DNA (say whether introns error-controlled well-preserved exons)? Suppose we have two strings of unequal lengths X n 1 = X 1, X 2,..., X n Y M 1 = Y 1, Y 2, Y 3, , Y M, where M n, taking values in a common finite alphabet A such that: X 1 n Y 1 M I(X;Y) X n 1 are independent and identically distributed with distribution P(x); Y i are also i.i.d. with a possibly different distribution Q(y); W(y x) is a channel such that P x A P(x)W(y x) = Q(y) for all y. We wish to differentiate between the following two scenarios: (I) Independence: X n 1 and Y M 1 are independent. (II) Dependence. An index J {1, 2,..., M n+1} is chosen and Y J+n 1 J is generated as the output of the channel W with input X n 1.

10 Mutual Information To distinguish (I) and (II), we compute mutual information defined as I(X; Y ) = X x,y A V (x, y)log V (x, y) P(x)Q(y) where V (x, y) is joint distribution of X and Y. More precisely: we compute the empirical mutual information Î j (n) between X n j+n 1 1 and Yj defined as Î j (n) = X x,y A ˆp j (x, y)log ˆp j(x, y) ˆp(x)ˆq j (y). where ˆp j (x, y) is the joint empirical distribution of (X n 1, Y j+n 1 j ), ˆp(x) and ˆq j (y) are the empirical distributions of X n j+n 1 1 and Yj. We conclude: (I) Independence implies Î j (n) 0, (II) Dependence implies Î j (n)> 0.

11 Scenario (I): Independence The multivariate central limit theorem (for ˆp j ) shows that ni(n) converges in distribution to a (scaled) χ 2 chi-squared distribution, that is, (2 ln 2)nI(n) D Z χ 2 (( A 1) 2 ), where Z has a χ 2 distribution with k = ( A 1) 2 ) degrees of freedom. Therefore, P e,1 = Pr{declare dependence independent strings} = Pr{I(n) > θ independent strings} Pr{Z > (2 ln 2)θn}, hence P e,1 1 γ(k, (θ ln 2)n) Γ(k) exp (θ ln 2)n.

12 Scenario (II): Dependence Here, I(n) I with probability one at rate 1/ n, that is, n[i(n) I] D V N(0, σ 2 ), where the resulting variance σ 2 is given by, σ 2 = Var log Therefore: W(Y X) = X Q(Y ) x,y A P(x)W(y x) log W(y x) Q(y) I 2. P e,2 = Pr{declare independence W -dependent strings} = Pr{I(n) θ W -dependent strings} Pr{V [θ I] n} (I θ)2 exp n, 2σ 2

13 Discussion 1. The threshold θ needs to be strictly between 0 and I = I(X; Y ). 2. Both probabilities of error decay exponentially fast, but P e,1 exp( (θ ln 2) n) and P e,2 exp I θ 2σ 2n 3. To estimate P e,2 knowledge of the value of σ 2 is required, but practically impossible to compute. 4. (Practical Approach). Since in practice declaring false positives is much more undesirable than overlooking potential dependence, in our experiments we decide on an acceptably small false-positive probability ǫ, and then select θ such that P e,1 ǫ.

14 Alternative Splicing Dependency exists between alternative splicing locations? How to measure and discover them? Note: Only genes must produce 100,000 proteins.

15 Alternative Splicing in zmsrp32 Figure 1: Alternative splicings of the zmsrp32 gene in maize. The gene consists of a number of exons (shaded boxes) and introns (lines) flanked by the 5 and 3 untranslated regions, UTR (white boxes). The gene zmsrp32 is coded by 4735 nucleotides and has four alternative splicing variants. Two of these four variants are due to different splicings of this gene, between positions and

16 Results: Various Grouping Mutual Information Mutual Information Base position on zmsrp32 gene seq (a) Base position on zmsrp32 gene seq (b) Figure 2: Estimated mutual information between the exon located between bases and each contiguous subsequence of length 369 in the intron between bases The estimates were computed for the standard four-letter alphabet {A, C, G, T } (shown in (a)), and for the corresponding transformed sequences for the two-letter purine/pyrimidine grouping {AG, CT } (shown in (b)).

17 Results: Finding Dependency Mutual Information Mutual Information Base position on zmsrp32 gene seq Base position on zmsrp32 gene seq (a) (b) Mutual Information Mutual Information Base position on zmsrp32 gene seq Base position on zmsrp32 gene seq (c) (d) Figure 3: Î j versus j for various groupings of nucleotides]: In (a) and (c), mutual information between the exon found between 1 78 and the intron located between , over: (a) four letter-alphabet {A, C, G, T }; and (b) the Watson-Crick pairs {AT, CG}. In (b) and (d), mutual information between the [exon located between and the intron between , for the (b) four letter alphabet and the (d) two letter grouping of purine/pyrimidine {AG, CT }.

18 Conclusions There results strongly suggests that: there is significant dependence between the bases in positions and certain substrings of the bases in positions The region contains the 5 untranslated sequences (UTR), an intron, and the first protein coding exon, the sequence encodes an intron that undergoes alternative splicing We narrow down the mutual information calculations to the 5 untranslated region (5 UTR) in positions 1 78 and the 5 UTR intron in positions A close inspection of the resulting mutual information graphs indicates that the dependency is restricted to the alternative exons embedded into the intron sequences in positions and

19 Outline Update 1. What is Information? 2. Information in Correlation 3. Information in Structures: Network Motifs Biological Networks Finding Biologically Significant Structures

20 Protein Interaction Networks Molecular Interaction Networks: Graph theoretical abstraction for the organization of the cell Protein-protein interactions (PPI Network) Proteins signal to each other, form complexes to perform a particular function, transport each other in the cell... It is possible to detect interacting proteins through high-throughput screening, small scale experiments, and in silico predictions Protein Interaction Undirected Graph Model S. Cerevisiae PPI network hspace0.3in [Jeong et al., Nature, 2001]

21 Modularity in PPI Networks A functionally modular group of proteins (e.g. a protein complex) is likely to induce a dense subgraph Algorithmic approaches target identification of dense subgraphs An important problem: How do we define dense? Statistical approach: What is significantly dense? RNA Polymerase II Complex Corresponding induced subgraph

22 Significance of Dense Subgraphs A subnet of r proteins is said to be ρ-dense if the number of interactions, F(r), between these r proteins is ρr 2, that is, F(r) ρr 2 What is the expected size, R ρ, of the largest ρ-dense subgraph in a random graph? Any ρ-dense subgraph with larger size is statistically significant! Maximum clique is a special case of this problem (ρ = 1) G(n, p) model n proteins, each interaction occurs with probability p Simple enough to facilitate rigorous analysis Piecewise G(n, p) model Captures the basic characteristics of PPI networks Power-law model

23 Largest Dense Subgraph on G(n, p) Theorem 1 (Koyuturk, Grama, W.S., 2006). If G is a random graph with n nodes, where every edge exists with probability p, then lim n R ρ log n = 1 H p (ρ) (pr.), where H p (ρ) = ρlog ρ 1 ρ + (1 ρ)log p 1 p denotes divergence. More precisely, log n P(R ρ r 0 ) O n 1/H p(ρ) «, where for large n. r 0 = log n log log n + log H p(ρ) H p (ρ)

24 Proof X r,ρ : number of subgraphs of size r with density at least ρ X r,ρ = {U V (G) : U = r F(U) ρr 2 } Y r : number of edges induced by a set U of r vertices E[X r ] = `n r P(Yr ρr 2 ) P(Y r ρr 2 ) ` r 2 ρr 2 (r 2 ρr 2 )p ρr2 (1 p) r2 ρr 2 Upper bound: By the first moment method: P(R ρ r) P(X r,ρ 1) E[X r,ρ ] Plug in Stirling s formula for appropriate regimes Lower bound: To use the second moment method, we have to account for dependencies in terms of nodes and existing edges Use Stirling s formula, plug in continuous variables for range of dependencies

25 Piecewise G(n, p) Model Few proteins with many interacting partners, many proteins with few interacting partners p h > p b > p 1 Captures the basic characteristics of PPI networks, where p l < p b < p h. G(V, E), V = V h V l such that n h = V h V l = n l P(uv E(G)) = 8 < : p h p l If n h = O(1), then P(R ρ r 1 ) O V h V 1 if u, v V h if u, v V l p b if u V h, v V l or u V l, v V h. log n n 1/H p l (ρ) «where r 1 = log n log log n + 2n h log B + log H pl (ρ) log e + 1 H pl (ρ) and B = (p b (1 p l ))/p l + (1 p b ).

26 SIDES An algorithm for identification of Significantly Dense Subgraphs (SIDES) Based on Highly Connected Subgraphs algorithm (Hartuv & Shamir, 2000) Recursive min-cut partitioning heuristic We use statistical significance as stopping criterion p << 1 p << 1 p << 1

27 Behavior of Largest Dense Subgraph Across Species Observed Gnp model Piecewise model HS SC 12 Observed Gnp model Piecewise model SC Size of largest dense subgraph BT AT OS RNHP EC MM CE DM Size of largest dense subgraph BT AT OS RN HP EC MM CE HS DM Number of proteins (log-scale) Number of proteins (log-scale) ρ = 0.5 ρ = 1.0 Number of nodes vs. Size of largest dense subgraph for PPI networks belonging to 9 Eukaryotic species

28 Behavior of Largest Dense Subgraph w.r.t Density Size of largest dense subgraph Observed Gnp model Piecewise model Size of largest dense subgraph Observed Gnp model Piecewise model Density Density S. cerevisiae H. sapiens Density threshold vs. Size of largest dense subgraph for Yeast and Human PPI networks

29 Performance of SIDES Biological relevance of identified clusters is assessed with respect to Gene Ontology (GO) Find GO Terms that are significantly enriched in each cluster Quality of the clusters For GO Term t that is significantly enriched in cluster C, let T be the set of proteins associated with t PPI GO Cluster specificity = 100 C T / C sensitivity = 100 C T / T P 1 P 2 P 4 P 5 P 3 P 6 C T SIDES MCODE Min. Max. Avg. Min. Max. Avg. Specificity (%) Sensitivity (%) Comparison of SIDES with MCODE (Bader & Hogue, 2003) on yeast PPI network derived from DIP and BIND databases

30 Performance of SIDES Specificity (%) Sensitivity (%) Cluster size (number of proteins) SIDES MCODE Cluster size (number of proteins) SIDES MCODE Correlation SIDES: 0.22 SIDES: 0.27 MCODE: MCODE: 0.36

31 THANK YOU

32 Outline Update 1. What is Information? 2. Information in Correlation 3. Information in Structures: Network Motifs Biological Networks Finding Biologically Significant Structures Structural Compression

33 where the summation is over all distinct structures. Structural Information The (descriptive) entropy of a random (labeled) graph process G is H G = E[ log P(G)] = X G G P(G) log P(G), where P(G) is the probability of a graph G. A random structure model is defined for an unlabeled version. labeled graphs have the same structure. Some G 1 1 G 2 1 G 3 1 G 4 1 S 1 S G 5 1 G 6 1 G 7 1 G 8 1 S 3 S Random Graph Model Random Structure Model The probability of a structure S is: P(S) = N(S) P(G) N(S) is the number of different labeled graphs having the same structure. The entropy of a random structure S can be defined as H S = E[ log P(S)] = X S S P(S) log P(S),

34 Relationship between H G and H S Two labeled graphs G 1 and G 2 are called isomorphic if and only if there is a one-to-one map from V (G 1 ) onto V (G 2 ) which preserves the adjacency. Graph Automorphism: a For a graph G its automorphism is adjacency preserving permutation of vertices of G. b d c e The collection Aut(G) of all automorphism of G is called the automorphism group of G.

35 Relationship between H G and H S Two labeled graphs G 1 and G 2 are called isomorphic if and only if there is a one-to-one map from V (G 1 ) onto V (G 2 ) which preserves the adjacency. Graph Automorphism: a For a graph G its automorphism is adjacency preserving permutation of vertices of G. b d c e The collection Aut(G) of all automorphism of G is called the automorphism group of G. Lemma 1. If all isomorphic graphs have the same probability, then H S = H G log n! + X S S P(S) log Aut(S), where Aut(S) is the automorphism group of S. Proof idea: Using the fact that N(S) = n! Aut(S).

36 Erdös-Rényi Graph Model Our random structure model is the unlabeled version of the binomial random graph model known also as the Erdös and Rényi model. The binomial random graph model G(n, p) generates graphs with n vertices, where edges are chosen independently with probability p. If a graph G in G(n, p) has k edges, then P(G) = p k q (n 2) k, where q = 1 p. Theorem 2 (Y. Choi and W.S., 2008). For large n and all p satisfying ln n n and 1 p ln n n (i.e., the graph is connected w.h.p.), n H S = h(p) log n! + o(1), 2 p where h(p) = p log p (1 p) log (1 p) is the entropy rate. Thus H S = n 2 h(p) n log n + n log e 1 2 log n 1 2 log (2π) + o(1). Proof idea: 1. H S = H G log n! + P S S P(S) log Aut(S). 2. H G = `n 2 h(p) 3. P S S P(S) log Aut(S) = o(1) by asymmetry of G(n, p).

37 Compression Algorithm Compression Algorithm called Structural zip, in short SZIP Demo.

38 Compression Algorithm Compression Algorithm called Structural zip, in short SZIP Demo. We can prove the following estimate on the compression ratio of S(p, n) for our algorithm SZIP. Theorem 3 (Y. Choi and W.S., 2008). The total expected length of encoding produced by our algorithm SZIP for a structure S from S(n, p), is at most n 2 h(p) n log n + n (c + Φ(log n)) + O(n 1 η ), where h(p) = p log p (1 p)log (1 p), c is an explicitly computable constant, η is a positive constant, and Φ(x) is a fluctuating function with a small amplitude or zero. Our algorithm is asymptotically optimal up to the second largest term, and works quite fine in practise.

39 Experimental Results Real-world and random graphs. Real-world Random Table 1: The length of encodings (in bits) Networks # of # of our adjacency adjacency arithmetic nodes edges algorithm matrix list coding US Airports 332 2,126 8,118 54,946 38,268 12,991 Protein interaction (Yeast) 2,361 6,646 46,912 2,785, ,504 67,488 Collaboration (Geometry) 6,167 21, ,365 19,012, , ,811 Collaboration (Erdös) 6,935 11,857 62,617 24,043, , ,377 Genetic interaction (Human) 8,605 26, ,199 37,018, , ,569 Internet (AS level) 25,881 52, , ,900,140 1,572, ,060 S(n, p) 1,000 p = , ,500 99,900 40,350 S(n, p) 1,000 p = , , , ,392 S(n, p) 1,000 p = , ,500 2,997, ,252 `n2 2e log n `n2 h(p) n : number of vertices e : number of edges Adjacency matrix : `n 2 bits Adjacency list : 2e log n bits Arithmetic coding : `n 2 h(p) bits (compressing the adjacency matrix)

Information Transfer in Biological Systems: Extracting Information from Biologiocal Data

Information Transfer in Biological Systems: Extracting Information from Biologiocal Data Information Transfer in Biological Systems: Extracting Information from Biologiocal Data W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 27, 2010 AofA and

More information

Shannon Legacy and Beyond. Dedicated to PHILIPPE FLAJOLET

Shannon Legacy and Beyond. Dedicated to PHILIPPE FLAJOLET Shannon Legacy and Beyond Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 18, 2011 NSF Center for Science of Information LUCENT, Murray Hill, 2011 Dedicated

More information

Facets of Information

Facets of Information Facets of Information W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 30, 2009 AofA and IT logos QUALCOMM, San Diego, 2009 Thanks to Y. Choi, Purdue, P. Jacquet,

More information

Facets of Information

Facets of Information Facets of Information Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 31, 2010 AofA and IT logos Ecole Polytechnique, Palaiseau, 2010 Research supported

More information

Science of Information

Science of Information Science of Information Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 July 9, 2010 AofA and IT logos ORACLE, California 2010 Research supported by NSF Science

More information

Shannon Legacy and Beyond

Shannon Legacy and Beyond Shannon Legacy and Beyond Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 July 4, 2011 NSF Center for Science of Information Université Paris 13, Paris 2011

More information

Information Transfer in Biological Systems

Information Transfer in Biological Systems Information Transfer in Biological Systems W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 July 1, 2007 AofA and IT logos Joint work with A. Grama, P. Jacquet, M.

More information

Science of Information Initiative

Science of Information Initiative Science of Information Initiative Wojciech Szpankowski and Ananth Grama... to integrate research and teaching activities aimed at investigating the role of information from various viewpoints: from fundamental

More information

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Self Similar (Scale Free, Power Law) Networks (I)

Self Similar (Scale Free, Power Law) Networks (I) Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007

More information

Assessing Significance of Connectivity and. Conservation in Protein Interaction Networks

Assessing Significance of Connectivity and. Conservation in Protein Interaction Networks Assessing Significance of Connectivity and Conservation in Protein Interaction Networks Mehmet Koyutürk, Wojciech Szpankowski, and Ananth Grama Department of Computer Sciences, Purdue University, West

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers. BICOB, Honolulu, 2017

Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers. BICOB, Honolulu, 2017 Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers Abram Magner, Jarek Duda, Wojciech Szpankowski, Ananth Grama March 20, 2017 BICOB, Honolulu, 2017 Outline 1. Information Theory and

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

Comparative Analysis of Molecular Interaction Networks

Comparative Analysis of Molecular Interaction Networks Comparative Analysis of Molecular Interaction Networks Jayesh Pandey 1, Mehmet Koyuturk 2, Shankar Subramaniam 3, Ananth Grama 1 1 Purdue University, 2 Case Western Reserve University 3 University of California,

More information

DNA Feature Sensors. B. Majoros

DNA Feature Sensors. B. Majoros DNA Feature Sensors B. Majoros What is Feature Sensing? A feature is any DNA subsequence of biological significance. For practical reasons, we recognize two broad classes of features: signals short, fixed-length

More information

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

More information

Phase Transitions in a Sequence-Structure Channel. ITA, San Diego, 2015

Phase Transitions in a Sequence-Structure Channel. ITA, San Diego, 2015 Phase Transitions in a Sequence-Structure Channel A. Magner and D. Kihara and W. Szpankowski Purdue University W. Lafayette, IN 47907 January 29, 2015 ITA, San Diego, 2015 Structural Information Information

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

A Graph-based Framework for Transmission of Correlated Sources over Multiple Access Channels

A Graph-based Framework for Transmission of Correlated Sources over Multiple Access Channels A Graph-based Framework for Transmission of Correlated Sources over Multiple Access Channels S. Sandeep Pradhan a, Suhan Choi a and Kannan Ramchandran b, a {pradhanv,suhanc}@eecs.umich.edu, EECS Dept.,

More information

Towards a Theory of Information Flow in the Finitary Process Soup

Towards a Theory of Information Flow in the Finitary Process Soup Towards a Theory of in the Finitary Process Department of Computer Science and Complexity Sciences Center University of California at Davis June 1, 2010 Goals Analyze model of evolutionary self-organization

More information

COMPARATIVE ANALYSIS OF BIOLOGICAL NETWORKS. A Thesis. Submitted to the Faculty. Purdue University. Mehmet Koyutürk. In Partial Fulfillment of the

COMPARATIVE ANALYSIS OF BIOLOGICAL NETWORKS. A Thesis. Submitted to the Faculty. Purdue University. Mehmet Koyutürk. In Partial Fulfillment of the COMPARATIVE ANALYSIS OF BIOLOGICAL NETWORKS A Thesis Submitted to the Faculty of Purdue University by Mehmet Koyutürk In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall Biology Biology 1 of 26 Fruit fly chromosome 12-5 Gene Regulation Mouse chromosomes Fruit fly embryo Mouse embryo Adult fruit fly Adult mouse 2 of 26 Gene Regulation: An Example Gene Regulation: An Example

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

An Information-Theoretic Approach to Methylation Data Analysis

An Information-Theoretic Approach to Methylation Data Analysis An Information-Theoretic Approach to Methylation Data Analysis John Goutsias Whitaker Biomedical Engineering Institute The Johns Hopkins University Baltimore, MD 21218 Objective Present an information-theoretic

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

MULE: An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks

MULE: An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks MULE: An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks Mehmet Koyutürk, Ananth Grama, & Wojciech Szpankowski http://www.cs.purdue.edu/homes/koyuturk/pathway/ Purdue University

More information

Stein s Method for concentration inequalities

Stein s Method for concentration inequalities Number of Triangles in Erdős-Rényi random graph, UC Berkeley joint work with Sourav Chatterjee, UC Berkeley Cornell Probability Summer School July 7, 2009 Number of triangles in Erdős-Rényi random graph

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

Molecular evolution - Part 1. Pawan Dhar BII

Molecular evolution - Part 1. Pawan Dhar BII Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

Connectivity. Corollary. GRAPH CONNECTIVITY is not FO definable

Connectivity. Corollary. GRAPH CONNECTIVITY is not FO definable Connectivity Corollary. GRAPH CONNECTIVITY is not FO definable Connectivity Corollary. GRAPH CONNECTIVITY is not FO definable If A is a linear order of size n, let G(A) be the graph with edges { i, i+2

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Coding into a source: an inverse rate-distortion theorem

Coding into a source: an inverse rate-distortion theorem Coding into a source: an inverse rate-distortion theorem Anant Sahai joint work with: Mukul Agarwal Sanjoy K. Mitter Wireless Foundations Department of Electrical Engineering and Computer Sciences University

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Adventures in random graphs: Models, structures and algorithms

Adventures in random graphs: Models, structures and algorithms BCAM January 2011 1 Adventures in random graphs: Models, structures and algorithms Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM January 2011 2 Complex

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

Introduction to Hidden Markov Models (HMMs)

Introduction to Hidden Markov Models (HMMs) Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns

Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns Mehmet Koyutürk, Wojciech Szpankowski, and Ananth Grama Dept. of Computer Sciences, Purdue University West Lafayette, IN

More information

A Multiobjective GO based Approach to Protein Complex Detection

A Multiobjective GO based Approach to Protein Complex Detection Available online at www.sciencedirect.com Procedia Technology 4 (2012 ) 555 560 C3IT-2012 A Multiobjective GO based Approach to Protein Complex Detection Sumanta Ray a, Moumita De b, Anirban Mukhopadhyay

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Phylogenetic Analysis of Molecular Interaction Networks 1

Phylogenetic Analysis of Molecular Interaction Networks 1 Phylogenetic Analysis of Molecular Interaction Networks 1 Mehmet Koyutürk Case Western Reserve University Electrical Engineering & Computer Science 1 Joint work with Sinan Erten, Xin Li, Gurkan Bebek,

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

Casting Polymer Nets To Optimize Molecular Codes

Casting Polymer Nets To Optimize Molecular Codes Casting Polymer Nets To Optimize Molecular Codes The physical language of molecules Mathematical Biology Forum (PRL 2007, PNAS 2008, Phys Bio 2008, J Theo Bio 2007, E J Lin Alg 2007) Biological information

More information

1 Mechanistic and generative models of network structure

1 Mechanistic and generative models of network structure 1 Mechanistic and generative models of network structure There are many models of network structure, and these largely can be divided into two classes: mechanistic models and generative or probabilistic

More information

Lecture 6: The feed-forward loop (FFL) network motif

Lecture 6: The feed-forward loop (FFL) network motif Lecture 6: The feed-forward loop (FFL) network motif Chapter 4 of Alon x 4. Introduction x z y z y Feed-forward loop (FFL) a= 3-node feedback loop (3Loop) a=3 Fig 4.a The feed-forward loop (FFL) and the

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

Broadcasting With Side Information

Broadcasting With Side Information Department of Electrical and Computer Engineering Texas A&M Noga Alon, Avinatan Hasidim, Eyal Lubetzky, Uri Stav, Amit Weinstein, FOCS2008 Outline I shall avoid rigorous math and terminologies and be more

More information

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006 Graph Theory and Networks in Biology arxiv:q-bio/0604006v1 [q-bio.mn] 6 Apr 2006 Oliver Mason and Mark Verwoerd February 4, 2008 Abstract In this paper, we present a survey of the use of graph theoretical

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels Representation of Correlated s into Graphs for Transmission over Broadcast s Suhan Choi Department of Electrical Eng. and Computer Science University of Michigan, Ann Arbor, MI 80, USA Email: suhanc@eecs.umich.edu

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

Spectral Methods for Subgraph Detection

Spectral Methods for Subgraph Detection Spectral Methods for Subgraph Detection Nadya T. Bliss & Benjamin A. Miller Embedded and High Performance Computing Patrick J. Wolfe Statistics and Information Laboratory Harvard University 12 July 2010

More information

On Third-Order Asymptotics for DMCs

On Third-Order Asymptotics for DMCs On Third-Order Asymptotics for DMCs Vincent Y. F. Tan Institute for Infocomm Research (I R) National University of Singapore (NUS) January 0, 013 Vincent Tan (I R and NUS) Third-Order Asymptotics for DMCs

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Information Recovery from Pairwise Measurements

Information Recovery from Pairwise Measurements Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Block 2: Introduction to Information Theory

Block 2: Introduction to Information Theory Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Shortest Paths & Link Weight Structure in Networks

Shortest Paths & Link Weight Structure in Networks Shortest Paths & Link Weight Structure in etworks Piet Van Mieghem CAIDA WIT (May 2006) P. Van Mieghem 1 Outline Introduction The Art of Modeling Conclusions P. Van Mieghem 2 Telecommunication: e2e A ETWORK

More information

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1 Lecture I jacques@ucsd.edu Notation: Throughout, P denotes probability and E denotes expectation. Denote (X) (r) = X(X 1)... (X r + 1) and let G n,p denote the Erdős-Rényi model of random graphs. 10 Random

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

Simulations. . p.1/25

Simulations. . p.1/25 Simulations Computer simulations of realizations of random variables has become indispensable as supplement to theoretical investigations and practical applications.. p.1/25 Simulations Computer simulations

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited Ch. 8 Math Preliminaries for Lossy Coding 8.4 Info Theory Revisited 1 Info Theory Goals for Lossy Coding Again just as for the lossless case Info Theory provides: Basis for Algorithms & Bounds on Performance

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Functional Characterization and Topological Modularity of Molecular Interaction Networks

Functional Characterization and Topological Modularity of Molecular Interaction Networks Functional Characterization and Topological Modularity of Molecular Interaction Networks Jayesh Pandey 1 Mehmet Koyutürk 2 Ananth Grama 1 1 Department of Computer Science Purdue University 2 Department

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Random Graphs. 7.1 Introduction

Random Graphs. 7.1 Introduction 7 Random Graphs 7.1 Introduction The theory of random graphs began in the late 1950s with the seminal paper by Erdös and Rényi [?]. In contrast to percolation theory, which emerged from efforts to model

More information